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EXECUTIVE  SUMMARY 


Airspace  encounter  models  capture  the  complex  geometry  and  behavior  of  aircraft 
after  standard  separation  assurance  has  failed.  Encounter  models  are  essential  to 
unmanned  aerial  system  (UAS)  sense  and  avoid  (SAA)  system  safety  evaluation.  A 
SAA  system  with  a  demonstrated  level  of  safety  using  extensive  simulation  and  flight 
test  evaluations  will  be  required  for  unmanned  aircraft  to  gain  greater  access  to  the 
National  Airspaee  System  (NAS).  The  Department  of  Homeland  Security  (DHS) 
strives  to  operate  unmanned  aireraft  in  the  littoral  regions  of  the  continental  United 
States  (CONUS),  but  recent  encounter  models  do  not  explicitly  distinguish  littoral 
traffic  from  other  traffie.  The  framework  and  data  used  to  build  previous  airspace 
encounter  models  for  the  NAS  were  used  to  develop  encounter  models  for  the  littoral 
regions  of  the  NAS.  Since  this  existing  framework  was  leveraged,  the  littoral  models 
were  direetly  compared  to  existing  models  to  ensure  that  they  had  both  sufficient 
data  and  distinctive  characteristics. 

The  primary  product  of  this  analysis  was  the  development  of  a  littoral  correlated  and 
uncorrelated  model  for  use  in  SAA  system  evaluation.  In  addition,  a  methodology 
was  developed  to  determine  whether  the  models  were  both  distinctive  from  their 
CONUS  counterparts  and  sufficient  in  terms  of  the  quantity  of  data  used  to  build 
the  models.  Application  of  this  methodology  demonstrated  that  both  the  littoral 
correlated  and  uncorrelated  model  characteristics  were  different  than  the  previous 
CONUS  models  and  that  sufficient  data  were  used  to  build  the  models.  Specifically, 
it  was  found  that  aircraft  flying  under  visual  flight  rules  in  the  littoral  environment 
have  higher  airspeeds  and  were  observed  at  higher  altitudes.  Aircraft  in  littoral 
correlated  encounters  have  higher  airspeeds  but  encounters  tend  to  occur  at  lower 
altitudes  than  those  in  the  NAS.  Lastly,  this  analysis  determined  that  although 
the  encounter  initial  conditions  tend  to  be  different  in  a  littoral  environment,  the 
aircraft  dynamic  behavior,  or  how  the  various  aireraft  states  transition  during  the 
encounter,  remains  the  same. 

Several  future  needs  were  identified  regarding  the  airspace  encounter  model  effort 
to  support  UAS  SAA  development  and  certification  for  current  and  future  DHS 
missions: 


1.  Develop  and  demonstrate  a  methodology  for  characterizing  the  littoral  non- 
cooperative  density  using  existing  radars.  This  is  a  required  parameter  for 
estimating  the  expected  encounter  rate  and  subsequently  estimating  the  near 
mid-air  collision  (NMAC)  rate — required  for  a  target  level  of  safety  safety  anal¬ 
ysis. 

2.  Update  the  littoral  model  characteristics  as  the  airspace  changes.  The  Fed¬ 
eral  Aviation  Administration’s  Next  Generation  Air  Transportation  System 
(NextGen)  and  the  integration  of  unmanned  aerial  systems  into  the  NAS  will 
likely  change  both  the  traffie  density  and  the  attributes  of  the  encounter  models. 
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This  will  require  monitoring  the  airspace  continuously,  and  releasing  updates 
as  the  model  parameters  change. 

3.  DHS  maritime  UASs  will  likely  operate  beyond  the  littoral  environment.  Thus, 
encounter  and  density  models  specific  to  the  oceanic  environment  should  be 
formed.  This  effort  will  require  additional  sensors  that  have  coverage  of  the  de¬ 
sired  operating  environment  and  that  extend  beyond  those  used  for  the  CONUS 
and  littoral  models. 
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1.  INTRODUCTION 


Airborne  safety  critical  systems  must  undergo  extensive  validation  under  realistic  conditions  prior 
to  certification  in  the  National  Airspace  System  (NAS).  Collision  avoidance  systems  for  manned 
aircraft,  and  the  associated  sense  and  avoid  (SAA)  systems  for  unmanned  aerial  systems  (UASs), 
provide  a  safety  critical  function — ensuring  separation  when  other  safety  layers  (e.g.,  the  underlying 
airspace  structure,  airspace  procedures,  and  air  traffic  control),  have  failed  to  maintain  separation. 
These  complex  systems  are  assessed  in  realistic  large  scale  Monte  Carlo  simulations  and  flight  tests 
to  prove  that  they  meet  the  desired  levels  of  safety.  Fundamental  to  these  simulations  is  the  use 
of  realistic  encounter  situations  between  aircraft  which  are  defined  by  the  relative  geometry  and 
behavior  of  the  aircraft  during  the  encounter.  The  geometry  and  dynamic  behavior  of  aircraft 
during  encounters  are  captured  in  encounter  models  that  provide  a  statistically  sufficient  set  of 
features  which  are  estimated  from  a  large  collection  of  observed  encounter  events.  The  initial  and 
continuing  evaluation  of  the  Traffic  Alert  and  Collision  Avoidance  System  (TCAS)  for  manned 
aircraft  illustrates  the  necessity  of  encounter  models  to  estimate  system  effectiveness  in  a  wide 
variety  of  encounter  geometries  [1-4].  This  system  evaluation  methodology  using  encounter  models 
for  system  assessment  is  required  by  ICAO  when  estimating  TCAS  safety  risk  [5]. 

There  are  fundamentally  two  types  of  encounters:  correlated  and  uncorrelated.  In  the  first  type,  at 
least  one  aircraft  in  the  encounter  is  receiving  air  traffic  control  (ATC)  services,  and  both  aircraft  are 
transponder  equipped.  In  the  second  type,  ATC  services  would  no  longer  be  provided  because  either 
one  aircraft  is  not  transponder  equipped  or  both  aircraft  are  flying  under  visual  flight  rules  (VFR). 
The  first  type  is  termed  “correlated'’  because  there  is  active  coordination  provided  by  ATC  prior 
to  the  loss  of  separation.  It  is  therefore  critical  that  the  model  capture  this  coordination  between 
aircraft  prior  to  the  loss  of  separation.  In  the  second  type  of  encounter,  a  lack  of  ATC  services 
results  in  uncoordinated  loss  of  separation,  hence  these  encounters  are  termed  “uncorrelated"  i.e.. 
aircraft  blunder  into  one  another.  This  uncorrelated  feature  is  exploited  by  modeling  each  aircraft 
individually  and  then  simulating  the  uncoordinated  loss  of  separation. 

Lincoln  Laboratory  recently  built  a  set  of  correlated  and  uncorrelated  encounter  models  for  con¬ 
ventional  and  unconventional  aircraft  [6-8].  Unconventional  aircraft  are  defined  here  as  aircraft 
unlikely  to  carry  transponder  equipment — e.g.,  gliders,  ultralights,  balloons  [9],  These  models  were 
intentionally  created  for  the  continental  United  States  (CONUS)  from  a  nearly  national  coverage  of 
radar  sensors.  Data  from  these  134  radars,  composed  of  long-range  ARSR-4  and  short-range  ASR- 
8,  ASR-9,  and  ASR-11  radars,  were  obtained  through  the  U.S.  Air  Force’s  84th  Radar  Evaluation 
Squadron  (RADES)  at  Hill  Air  Force  Base  in  Utah.  The  theoretical  horizontal  sensor  coverage  is 
shown  in  Figure  1  this  illustration  assumes  the  sensor  specification  for  detection  range  without 
terrain  masking.  The  data  in  the  original  uncorrelated  model  were  collected  during  the  time  periods 
spanning  1  7  December  2007  and  1-7  June  2008  while  the  data  for  the  correlated  model  spanned 
December  1,  2007  through  August  31,  2008.  Although  the  data  to  support  the  uncorrelated  model 
were  captured  over  the  identical  time  period  as  for  the  correlated  model,  the  data  quantity  during 
this  two  week  time  period  was  determined  to  he  sufficient.  These  original  encounter  models  are 
termed  the  “CONUS”  models. 

A  SAA  system  which  meets  a  specified  level  of  safety  will  be  required  for  consistent  and  widespread 
UAS  operations  in  the  NAS  [10].  Encounter  models  are  used  to  characterize  the  performance  of 
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Figure,  1.  Radar  coverage  map . 

the  SAA  system  in  the  expected  UAS  operational  environment.  The  Department  of  Homeland 
Security  (DHS)  has  identified  the  need  for  UAS  operations  in  the  littoral  regions  of  the  NAS  to 
support  the  missions  of  the  U.S.  Coast  Guard  and  Customs  and  Boarder  Protection  [11].  Because 
of  this  identified  UAS  need,  and  the  requirement  to  characterize  the  airspace  for  proving  SAA 
system  effectiveness,  an  encounter  model  specific  to  the  littoral  regions  is  desired. 

Although  the  CONUS  encounter  models  implicitly  incorporate  the  littoral  regions  of  the  NAS,  they 
do  not  separate  the  littoral  region  explicitly  and  do  not  therefore,  capture  the  differences  that  may 
be  expected  in  the  encounter  characteristics.  For  example,  it  is  expected  that  a  littoral  model  would 
capture  terminal  traffic  when  a  major  airport,  or  its  arrival  and  departure  routes,  is  located  within 
or  near  the  littoral  region  (e.g.,  Boston,  Los  Angeles,  Miami).  Because  of  the  oceanic  proximity  of 
these  terminal  areas,  there  may  be  different  encounter  characteristics  than  terminal  areas  residing 
elsewhere.  In  general,  littoral  air  traffic  is  expected  to  be  less  dense  than  other  CONUS  regions 
and  have  fewer  instances  of  aircraft  training  and  aerobatics. 

The  primary  objective  of  this  report  is  to  present  a  methodology  for  forming  correlated  and  uncor¬ 
related  encounter  models  specific  to  the  littoral  regions  of  the  NAS.  These  models  are  created  using 
a  similar  methodology  and  the  data  previously  obtained  and  processed  into  the  CONUS  models. 
These  models  will  be  made  publicly  available  and  in  a  format  compatible  with  the  CONUS  mod¬ 
els.  Although  discrete-  and  1200-code  (VFR)  traffic  is  observed  in  the  littoral  regions,  very  little 
unconventional  aircraft  traffic  was  observed.  Hence,  the  focus  of  this  document  is  the  creation  of 
a  correlated  and  uncorrelated  conventional  model.  The  uncorrelated  model  capturing  conventional 
(VFR)  traffic  will  be  referred  to  as  the  uncorrelated  model  for  brevity.  Because  the  data  span  the 
same  duration  and  the  geographic  domain  remains  the  same  for  the  littoral  and  CONUS  models, 
a  separate  description  of  encounter  and  traffic  density  is  unneeded. 

Before  describing  the  approach  for  building  the  models,  a  definition  of  the  lit  toral  regions  is  required. 
The  Department  of  Defense  (DoD)  defines  the  littoral  environment  as  follows  [12]. 
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The  littoral  comprises  two  segments  of  operational  environment: 

1.  Seaward:  the  area  from  the  open  ocean  to  the  shore,  which  must  be  controlled  to 
support  operations  ashore. 

2.  Landward:  the  area  inland  from  the  shore  that  can  be  supported  and  defended 
directly  from  the  sea. 

Because  this  model  is  primarily  intended  for  use  when  developing  and  certifying  UAS  deployed  to 
support  the  DBS  maritime  mission,  the  littoral  environment  is  broadly  defined  as  the  area  from 
the  U.S.  coastline  oceanward  to  international  waters  (approximately  200  NM  from  the  coastline). 


1.1  DOCUMENT  OVERVIEW 

This  report  describes  the  process  for  modifying  the  existing  CONUS  encounter  models  to  reflect 
the  littoral  airspace.  This  document  does  not  describe  the  complete  process  for  processing  the 
radar  data  into  the  encounter  models,  but  the  process  is  identical  to  that  described  in  the  reports 
detailing  the  CONUS  models.  This  document  also  assumes  that  the  reader  is  familiar  with  the 
process  for  creating  and  subsequently  using  the  encounter  models.  For  a  brief,  high  level  discussion 
of  the  models,  refer  to  [13  15].  The  main  text  of  this  document  does  not  discuss  Bayesian  networks, 
although  Appendix  E  is  included  to  describe  the  associated  nomenclature. 

Section  2  first  presents  the  approach  for  filtering  the  radar  data  to  extract  littoral  data.  This 
section  then  presents  the  methodology  for  comparing  the  CONUS  model  to  the  littoral  model  and 
to  confirm  the  distinct  use  of  the  littoral  models.  It  concludes  with  a  discussion  on  determining 
the  sufficiency  of  the  data  collected  by  presenting  metrics  for  model  convergence. 

Sections  3  and  4  describe  the  models  that  were  generated  and  then  carries  out  the  model  compar¬ 
ison  and  model  convergence  methodology  of  Section  2  for  the  uncorrelated  and  correlated  models, 
respectively.  The  candidate  model  structures,  graphically  representing  the  conditional  dependences 
between  the  model  variables,  are  presented  in  Appendix  I). 


3 


This  page  intentionally  left  blank. 


2.  APPROACH 


This  section  describes  the  methodology  for  determining  which  observed  aircraft  tracks  and  en¬ 
counters  are  considered  littoral.  The  section  subsequently  discusses  the  selected  procedure  for 
quantitatively  comparing  the  CONUS  and  littoral  models  and  for  determining  that  sufficient  data 
were  used  to  build  the  models. 


2.1  LITTORAL  SPATIAL  FILTERING 

The  littoral  environment  is  defined  in  Section  1.1  as  the  airspace  extending  from  the  coastline  to 
international  waters.  Two  slight  modifications  were  made  to  this  definition  when  filtering  tracks 
and  encounters  for  the  models.  First,  a  small  buffer,  corresponding  to  1.67  •  10-2  deg  of  latitude  and 
longitude  (approximately  1  NM)  was  added  to  the  coastline  in  order  to  remove  portions  of  tracks 
that  transit  small  inlets  (less  than  2NM  wide)  along  the  coast,  including  bays  with  small  inlets 
e.g.,  San  Fransisco  Bay.  Second,  no  specific  limit  was  placed  in  the  distance  from  the  coastline. 
Although  no  specific  limit  was  placed  on  the  distance  from  the  coastline,  there  is  a  realistic  limitation 
provided  by  the  maximum  range  of  the  sensors.  For  long-range  Air  Route  Surveillance  Radars 
(ARSR-4)  surrounding  the  edge  of  the  CONUS,  the  maximum  range  specification  for  primary 
and  secondary  (beacon)  surveillance  is  approximately  250  NM.  Encounters  have  been  observed  out 
to  approximately  450  NM  at  an  altitude  of  35,000  ft  with  secondary  surveillance,  although  this 
relatively  long  450  NM  range  does  not  extend  to  sea  level. 

The  World  Vector  Shoreline  Plus  coastline  product  from  the  National  Imagery  and  Mapping  Agency 
was  used  to  identify  the  coastline  [16].  The  highest  resolution  1:250,000  scale  source  specification 
requires  that  90  percent  of  all  identifiable  shoreline  features  be  located  within  500  meters.  Oceanic 
islands,  including  the  Caribbean  islands,  were  classified  as  littoral.  To  create  the  buffer  zone  as 
described  above,  the  1:12,000,000  scale  product  was  used  to  reduce  computation  time,  and  the 
buffer  was  created  such  that  all  coastal  points  were  at  least  1  NM  from  the  buffer.  An  illustration 
of  the  coastline  data  with  the  buffer  is  shown  in  Figure  2.  Once  filtered  with  the  littoral  boundary, 
the  remaining  tracks  are  processed  into  the  encounter  models.  Tracks  with  littoral  duration  less 
than  30s  were  also  removed  from  processing.  After  filtering,  10%  of  encounters  and  3.8%  of  1200- 
code  flight  hours  passed  the  filter  criteria — corresponding  to  40,985  encounters  and  2049  1200-code 
flight  hours,  respectively.  Because  of  t  he  relatively  small  data  set  used  to  build  the  littoral  models 
when  compared  to  the  CONUS  models,  the  question  of  whether  adequate  data  were  included 
within  the  model  arises.  It  should  be  noted  that  although  only  10%  of  encounters  were  classifed 
as  littoral,  these  40,985  encounters  represent  an  order  of  magnitude  increase  in  the  number  of 
encounter  over  previous  models  17  22].  The  process  for  determining  this  adequacy,  in  terms  of  the 
model  convegerence,  will  be  discused  in  the  following  section. 


2.2  MODEL  CONVERGENCE 

After  spatial  filtering  to  determine  which  encounters  and  1200-code  tracks  are  in  the  littoral  envi¬ 
ronment,  the  littoral  models  must  make  use  of  a  significantly  smaller  data  set  than  was  used  for 


Figure  2.  Littoral  buffer  for  the  Eastern  Coast  of  Massachusetts.  The  littoral  buffer  was  implemented  to 
remove  tracks  transiting  small  coastal  inlets  and  bays.  The  coastline  is  the  1:250,000  product  while  the  buffer 
was  created  using  the  1:12,000,000  scale  product. 

the  CONUS  models.  Therefore,  in  addition  to  comparing  the  littoral  and  CONUS  models  to  deter¬ 
mine  if  they  are  similar,  it  must  be  determined  that  there  are  sufficient  data  in  the  littoral  model. 
This  is  determined  by  comparing  the  convergence  of  (1)  the  parameters  in  the  model  and  (2)  the 
probability  that  an  encounter  leads  to  a  near  mid-air  collision  (NMAC)  as  data  are  added  to  the 
model.  The  F(nmac  |  enc)  is  the  primary  characteristic  that  is  used  to  determine  the  effectiveness 
of  a  collision  avoidance  system  using  the  encounter  models,  and  it  is  therefore  critical  that  this 
value  converges. 

In  general,  the  convergence  process  is  accomplished  by  adding  fixed  quantities  of  data  (subsets 
of  the  total  data  set)  and  monitoring  the  convergence  of  the  key  parameters.  There  is  not  a 
quantitative  metric  for  unequivocally  concluding  that  the  necessary  metric  converged.  Thus,  the 
convergence  parameters  are  presented  against  those  for  the  CONUS  model  and  then  discussed. 
This  presentation  of  the  convergence  is  sometimes  termed  the  convergence  curve.  It  is  assumed 
that  the  parameters  for  the  CONUS  model  converged,  so  direct  comparison  is  made.  The  full  data 
was  split  up  into  32  approximately  equal  subsets  for  the  convergence  process.  These  individual 
subsets  are  denoted  with  superscript  t  while  the  cumulative  sum  of  the  data  from  the  first  subset 
through  to  subset  t  is  denoted  the  (larger)  data  subset  T. 

2.2.1  Model  Parameter  Convergence 

The  first  convergence  metric  is  formed  from  the  individual  parameters  that  make  up  the  Bayesian 
network — 6^  =  P(Aj  =  k  |  i r^).  Note  that  the  nomenclature  as  defined  in  Appendix  E  is  used 
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in  this  section.  Using  a  metric  that  is  dependent  on  the  model  parameters  makes  the  metric  in¬ 
dependent  of  the  model  implementation-  in  this  case,  the  simulation  methodology.  When  these 
parameters  converge,  the  many  probability  distributions  that  make  up  the  model  necessarily  con¬ 
verge.  The  discrepancy  between  the  parameters  as  additional  data  subsets  are  added  is  simply  the 
difference  between  the  parameters:  Although  this  discrepancy  may  be  used  directly 

to  determine  the  model  convergence,  there  are  instances  when  the  discrepancy  may  be  skewed. 
This  occurs  when  the  probability  of  a  parental  instantiation  is  low.  For  example,  if  a  certain  in¬ 
stantiation  of  the  parents  only  occurs  once  in  the  data,  then  the  change  in  the  parameters  given 
that  instantiation  would  be  large  even  though  the  probability  of  the  parental  instantiation  is  low. 
Hence,  the  joint  probability  of  the  variable  and  the  parental  instantiation,  which  weights  each  of 
the  parameters  by  the  probability  of  the  parental  instantiation,  is  used  to  determine  convergence: 

4>i]k  =  P(Xi,  m)  =  P{Xi  I  7ru)P(7Tij)  =  ,  (1) 

where  Nt  is  the  number  of  data  points  for  variable  i.  Note  that  the  actual  parameters  of  the  model 
are  computed  by  normalizing  each  individual  count  by  the  number  of  counts  for  each  parental 
instantiation  Since  the  models  are  composed  of  complete  data  (data  without  missing 

variables),  then  Ni  is  equivalent  to  the  total  number  of  data  points  (Ar)  used  for  building  the  model. 
Since  there  are  many  parental  instantiations,  and  thus  many  thousands  of  parameters,  the  mean 
of  the  difference  between  the  joint  probabilities  is  used  as  a  summary  statistic: 

=  (2) 

2^i-i 1  i<h  tjk 

The  joint  probability  discrepancy  l)  will  necessarily  decrease  as  each  equally  sized  sul>set 

is  added  because  the  ratio  of  the  number  of  counts  added  at  each  iteration  to  the  counts  already 
in  the  model  decreases  as  t  increases.  This  effect  will  be  illustrated  using  the  following  example. 
Suppose  that  there  exists  a  model  with  one  variable  which  can  take  on  two  values,  A  and  B, 
and  400  data  points  exist  for  building  the  model.  To  determine  convergence,  the  400  points  are 
separated  into  four  equally  sized  subsets.  Each  subset  has  exactly  50  instantiations  of  A  and  50 
of  B,  although  this  is  unknown  a  priori.  In  addition  to  computing  the  convergence  parameters 
(<f>ijk)  at  each  iteration,  the  maximum  that  each  of  these  parameters  can  change  given  the  data 
that  are  added  at  each  iteration  is  also  computed.  This  quantity  is  denoted  \4>J]k  nlax  —  <t\jk 1 1, 
where  (f)Jjk  max  is  the  value  that  maximizes  this  quantity.  This  example  is  show  in  Table  1.  As 
equal  amounts  of  data  are  added  (100  points),  the  maximum  possible  difference  between  the  model 
parameters  decreases  from  0.25  to  0.13.  To  counteract  this  effect  when  adding  equal  quantities  of 
data,  Equation  2  is  normalized  by  the  mean  of  the  maximum  possible  parameter  discrepancy, 

^0T,max  ”  *  7  ^  >  l^ijfc,max  —  ^ijk  I  1  00 

mi  ijk 

where  the  term  \(p[jk  max  —  <^^1|  is  defined  as  the  maximum  value  that  the  joint  probabilities  can 
change  given  the  parental  instantiations  that  are  observed  with  subset  t.  This  maximum  is  found 
when 

jyT.  =  I  NW  1  +  Nh  when  ‘  = 

|  0  otherwise 
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TABLE  1 


Example  illustrating  the  maximum  that  the  individual  model  parameters  can  change 
given  that  constant  sized  data  subsets  are  added  to  the  model. 


T 

nt 

iyA 

<PTb 

$  A,  max 

max 

I^Amax  &A 

1 

50 

50 

0.5 

0.5 

- 

- 

- 

- 

2 

100 

100 

0.5 

0.5 

0.75 

0.25 

0.25 

0.25 

3 

150 

150 

0.5 

0.5 

0.07 

0.33 

0.17 

0.17 

4 

200 

200 

0.5 

0.5 

0.63 

0.38 

0.13 

0.13 

where  the  quantity  min^  N?k  1  is  the  minimum  probability  of  the  discrete  distribution  P(x*  \  n ry) 
for  the  data  summation  T  —  1.  In  other  terms,  this  maximum  value  is  found  when  all  of  the  data 
for  a  parental  instantiation  at  the  current  iteration  is  added  to  the  parameter  with  the  lowest  value 
following  the  previous  data  iteration.  This  metric,  Sfo/bcf) r,max>  is  termed  the  “normalized  mean 
convergence”,  and  qualitatively  is  a  summary  of  how  much  the  model  parameters  actually  deviate 
compared  to  the  maximum  amount  that  the  parameters  can  change.  Considering  only  the  error 
caused  by  sampling  from  the  actual  population  of  encounters,  this  metric  is  approximately  constant 
over  the  number  of  iterations.  When  there  are  multiple  disparate  populations  and  one  subset  does 
not  capture  adequate  samples  from  each  population,  then  the  metric  will  be  observed  to  converge 
to  a  non- zero  value  with  increasing  data.  The  former  behavior  illustrates  that  an  increased  amount 
of  data  only  reduces  the  error  in  estimating  the  discrete  distributions,  while  the  latter  convergence 
behavior  indicates  that  increased  data  also  provide  additional  information.  The  normalized  mean 
convergence  will  not  converge  to  zero  because  the  parameters  in  the*  model  will  almost  always 
change  due  to  sampling  error  when  additional  data  are  added. 

2.2.2  NMAC  Probability  Convergence 

The  process  for  determining  the  P(nmac  |  enc)  is  different  for  the  uncorrelated  and  correlated 
models.  For  the  correlated  model,  the  expected  P(nmac  |  enc)  is  encoded  directly  in  the  model  = 
P(vmd  <  100ft,  hmd  <  500ft  |  enc).  When  computing  this  probability,  a  uniform  geographic 
assumption  (over  airspace  class  and  altitude  layer)  was  assumed  to  remove  the  effects  of  differing 
geographic  distributions  between  the  CONUS  and  littoral  models  (see  Eq.  8).  The  P(nmac  |  enc) 
for  the  uncorrelated  model  must  be  determined  through  simulation  since  it  is  not  explicitly  defined 
within  the  model.  Although  simulation  must  be  used  to  determine  the  NMAC  probability,  for 
the  specific  case  of  non-maneuvering  aircraft  (i.e.,  not  accelerating,  turning,  or  climbing),  the 
P(nmac  |  enc)  without  a  collision  avoidance  system  is  rnmac/renc  when  the  ratio  of  cylinder  height 
to  width  is  the  same  for  both  the  encounter  cylinder  and  the  NMAC  cylinder.  For  the  uncorrelated 
simulation,  an  encounter  cylinder  with  a  height  of  2500  ft  and  a  width  of  12,500  ft  was  assumed  (25 
times  the  NMAC  cylinder  dimensions).  Thus,  the  expected  non-maneuvering  NMAC  probability 
is  1/252  or  1.6-  11T3. 

The  direct  sampling  method  for  determining  the  NMAC  probability  convergence  curve  for  the 
uncorrelated  model  would  require  simulating  32  sets  of  encounters,  one  for  each  addition  of  data. 
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In  practice,  upwards  of  one  million  encounters  are  required  for  each  simulation,  so  the  convergence 
determination  process  would  required  32  million  simulations.  Instead,  importance  sampling  [23] 
is  used  to  reduce  the  computational  requirement  by  creating  one  set  of  encounters  for  the  model 
with  the  full  data  set  and  subsequently  weighting  the  NMAC  probability  according  the  probability 
that  the  encounter  would  have  resulted  from  each  of  the  32  submodels.  The  Bayesian  network 
model,  including  the  graphical  structure  ((7)  and  the  parameters  (0),  is  denoted  B.  B  without  a 
superscript  denotes  the  model  with  the  full  set  of  data.  The  importance  sampling  procedure  when 
determining  the  NMAC  probability  is  estimated  using 

P(nmac  |  enc,  Br)  =  jjYi  P(nniac  I  enc>  g)~^  'nc\B)  '  ^ 

The  term  /■’(enc  |  B™)/P(e nc  |  B)  is  termed  the  importance  sampling  weight  and  is 

P(enc  1  flr)  P(xji)  |  Br)P(x^}  |  £r) 

P(enc  |  B)  p(xW  |  B)P{xf  \  B) 

where  and  x2  are  the  model  samples  for  encounter  i  and  Aircraft  1  and  Aircraft  2,  respectively. 
P(xl))  is  the  joint  probability  for  the  sample  i  from  the  posterior  distribution  of  the  encounter 
models  (see  Eq.  E-l).  The  full  formulation  of  P(x^)  is  the  combination  of  the  joint  probabilities 
for  the  initial  and  transition  Bayesian  networks,  although  only  the  initial  network  is  considered  for 
this  analysis.  Ignoring  the  transition  network  is  useful  for  numerical  convenience,  since  considering 
the  transition  network  after  a  few  hundred  time  steps  would  make  each  probability  computatiorialy 
inconsequential.  Furthermore,  the  joint  probability  is  only  considered  over  the  non-geographic  bins 
to  remove  discrepancies  in  the  probabilities  caused  by  different  geographical  distributions.  These 
assumptions  result  in  P(x^)  simplifying  to  P(v^l\  v^l\  h^l\  where  the  variables  are  defined  at 
encounter  initialization.  It  should  be  noted  that  the  original  method  for  initializing  and  weighting 
uncorrelated  encounters  to  account  for  not  using  the  appropriate  sampling  distributions  (as  in  7]) 
has  been  updated  and  is  described  in  Appendix  C.  In  addition  to  visually  inspecting  and  discussing 
the  results,  the  iteration  where  the  NMAC  probability  estimate  falls,  and  remains,  within  1%  and 
5%  of  the  final  estimate  is  presented. 

2.3  MODEL  COMPARISON 

One  of  the  primary  assumptions  for  forming  separate  littoral  models  is  that  the  encounter  charac¬ 
teristics  are  sufficiently  dissimilar  from  the  CONUS  models  to  warrant  using  the  separate  models. 
Thus,  an  effort  is  undertaken  in  this  section  to  develop  a  methodology  to  validate  this  assumption 
by  first  comparing  the  individual  feature  distributions,  then  the  complete  models  themselves. 

2.3.1  Marginal  Distribution  Comparison 

A  basic  quantitative  comparison  between  two  discrete  probability  distributions  is  to  directly  com¬ 
pare  the  model’s  individual  feature  distributions — e.g.,  airspeed,  vertical  rate.  Equation  7  displays 
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the  method  for  computing  the  marginal  distribution  of  the  variable  instantiation  xx  given  the  con¬ 
ditional  probability  of  the  variable  given  the  jth  parental  instantiation  and  the  probability  of 
the  parental  instantiation. 


P(xi)  =  '£P(Xi  =  k\i'ij)P(nii) 


(7) 


It  is  possible  that  the  marginal  feature  distributions  for  two  regions  may  differ  based  solely  on  the 
different  airspace  structure — i.e.,  the  airspace  class  (A)  and  altitude  layer  (L)  distributions  may 
vary,  resulting  in  a  perceived  change  in  the  overall  aircraft  behavior.  For  example,  if  the  New  York 
City  airspace  is  compared  to  the  North  Dakota  airspace,  differences  are  expected  in  the  feature 
distributions  which  is  likely  caused  by  differences  in  the  airspace  structure — the  New  York  City 
airspace  is  saturated  with  terminal  areas  and  a  high  aircraft  density  in  Class  A  airspace  while  the 
North  Dakota  airspace  is  mostly  Class  E  and  G  airspace  below  18,000  ft  mean  sea  level  (MSL)  with 
a  small  amount  of  transiting  (en  route)  traffic.  Thus,  Equation  8  is  used  to  remove  this  factor  from 
the  analysis,  where  Qa,l  is  the  number  of  A  and  L  parental  instantiations  for  variable  i. 

Pohi(xi)  =  1  J^P(xi\AL)  (8) 

^  xt 


This  formation  essentially  assumes  an  objective  (uniform)  probability  distribution  over  A  and  L 
the  probability  of  each  airspace  class  and  altitude  layer  is  assumed  equal.  An  alternative  is  to 
assume  the  A  and  L  distribution  for  one  of  the  models  although  the  metric  as  formulated  ignores 
any  dependence  on  these  variables. 

Classically,  the  Pearson’s  y2  goodness-of-fit  test  has  been  used  to  assess  whether  the  difference 
between  the  expected  and  sampled  univariate  distributions  is  the  result  of  sampling  error  [24]: 


K 


=  £ 

k=l 


(VkjS  -  njcjtf 

nk,E 


(9) 


where  K  is  the  number  of  discrete  bins,  is  the  kth  bin  frequency  (counts),  and  the  subscripts  S 
and  E  denote  the  sampled  and  expected  distributions,  respectively.  This  formulation  assumes  that 
the  x2  distribution  has  K  —  1  degrees  of  freedom.  For  this  discussion,  the  expected  distribution 
will  be  that  for  the  CONUS  model  while  the  sampled  will  be  that  of  the  littoral  model.  Using 
this  formulation  of  the  goodness-of-fit  test  assesses  the  null  hypothesis  that  the  difference  between 
the  expected  distribution  (CONUS)  and  sampled  distribution  (littoral)  is  caused  by  sampling  error 
alone.  This  test  was  originally  intended  for  relatively  small  sample  sizes:  as  the  number  of  samples 
increases  beyond  approximately  10,000,  x2  becomes  highly  sensitive  to  minor  deviations  between 
the  two  distributions  such  that  the  deviations  may  become  statistically  significant  according  to  the 
test  [25].  To  illustrate  this  effect  and  the  amount  of  deviation  that  the  test  allows,  Equation  9  is 
solved  for  the  deviance,  defined  according  to 


d(x2)  =  I  Ps~Pe\  = 


nK2  ' 


(10) 


where  p  is  the  bin  frequency  (n*.)  normalized  by  the  sample  size  n.  The  deviance  is  defined  as 
the  assumed  constant  relative  frequency  difference  between  the  expected  and  sampled  distributions 
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when  the  expected  distribution  ( p #)  is  assumed  uniform.  The  subscript  k  is  excluded  in  Equation  10 
to  stress  that  the  deviance  assumes  that  the  difference  between  each  frequency  is  constant  for  each 
bin.  Furthermore,  this  metric  should  not  be  confused  with  the  expected  value  of  the  bin  frequency 
difference.  Although  d{x2)  provides  insight  into  the  x 2  test’s  allowable  difference  between  the 
distributions,  Equation  9  is  again  modified,  this  time  assuming  that  the  total  difference  between 
distributions  is  associated  with  a  single  bin.  Although  this  is  not  practically  possible  since  the 
relative  frequencies  ( p )  must  sum  to  unity  for  each  distribution,  this  formulation  of  the  deviance 
places  an  upper  bound  on  the  allowable  difference  for  each  bin.  This  formulation  is  termed  the 
maximum  deviance,  and  is  given  by 


^max(X  ) 


nl\ 


>  argmax(|p*,s  -  I 


(11) 


Observe  that  the  constructions  for  d(x2)  ail(l  dmax(x2)  differ  by  \J\  K.  This  definition  of  the 
deviance  should  not  be  confused  with  the  Bayesian  deviance  which  is  also  used  to  compare  the  fit 
of  probabilistic  models  [26].  For  a  sample  size  of  10,000,  assuming  a  p- value  of  5%,  with  seven  bins, 
d(x2)  =  5.1  •  10~3  and  dmax(x2)  =  1.34  •  10“2,  indicating  that  very  small  deviances  would  result 
in  rejecting  the  hypothesis  that  the  distributions  were  equivalent.  Hamada  et  al.  [27]  recommend 
selecting  the  number  of  bins  proporational  to  the  sample  size,  or  K  ^  n°  4,  when  using  a  goodness- 
of-fit,  test.  Since  the  bins  are  already  defined  for  the  encounter  models,  the  sample  size  used  for 
calculation  of  the  x2  statistic  may  be  modified,  resulting  in  a  goodness-of-fit  test  of  the  formulation 


X 


2 


y*  (P^S  ~  Pk,E)2 


(12) 


where  nm  is  the  modified  sample  size  (nm  =  K2 ■°  according  to  Hamada  et  al.).  The  deviance  and 
maximum  deviance  for  nm  —  /C2  5  and  nm  —  1000  as  a  function  of  the  number  of  bins  are  shown 
in  Figure  3  to  illustrate  the  difference  between  the  two  effective  sample  size  assumptions. 

Since  the  CONUS  and  littoral  models  are  compared  directly,  the  true  sample  size  of  the  marginal 
littoral  distribution  is  the  number  of  data  points  filtered  by  the  littoral  buffer  (7.38  •  106).  The 
standard  x2  test  would  almost  certainly  reject  the  two  distributions  as  being  equivalent  due  to  the 
large  number  of  data  instances;  therefore,  the  modified  sample  size  nm  is  instead  set  to  1000.  This 
is  equivalent  to  allowing  the  same  distribution  discrepancy  that  is  accepted  for  the  standard  \2 
test  with  1000  samples.  Using  the  y2  test  with  a  modified  sample  size  of  1000,  allows  for  deviances 
caused  by  factors  other  than  sampling  error.  The  sample  size  formulation  proportional  to  K  is  not 
used  since  it  allows  an  unacceptable  deviance  with  fewer  than  five  bins.  Because  the  results  of  tiny 
hypothesis  test  may  be  misleading,  the  associated  p- values  and  the  distributions  themselves  are 
also  presented.  The  p- value  is  the  probability  of  obtaining  the  sample  (littoral)  distribution  or  one 
more  extreme  by  chance  alone;  thus,  a  p-value  near  zero  indicates  little  confidence  that  that  the 
sample  came  from  the  proposed  distribution. 


2.3.2  Bayesian  Network  Similarity  Score 

Although  comparing  the  marginal  feature  distributions  provides  a  quantitative  measure  of  the  dis¬ 
crepancy  between  the  two  distributions,  it  does  not  consider  the  interdependences  between  variables 
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Figure  3.  Deviance  metric  sensitivity  to  number  of  bins  assuming  a  p-value  ofo%.  Shown  is  the  the  nominal 
and  maximum  deviance  for  two  definitions  of  the  sample  size. 


that  are  explicitly  captured  within  the  Bayesian  network.  A  complete  comparison  between  the  two 
models  should  consider  this  important  model  characteristic.  In  order  to  quantitatively  compare  the 
models,  the  probability  that  two  different  populations  have  the  same  Bayesian  network  parameters 
given  the  recorded  data  is  computed.  This  assumes  that  the  graphical  structure  of  both  networks 
is  identical. 


The  true  parameters  of  the  two  populations  are  denoted  as  0\  and  62  and  the  data  associated  with 
the  two  populations  as  D\  and  D2 .  The  posterior  is  computed  as  P(Q\  =  62  \  P\ ,  A2X  which 
involves  multiplying  the  prior  distribution  of  the  hypothesis  that  P{9\  =  62)  by  the  ratio 

P(DuD2  I  9i  =02)  fP(DuD2  |  9)p(0)d9 

P(DuD 2)  [/  P(Dx  |  0)p(0)  dO ]  [  JP(D2  I  9)p(0)  dO ]  '  1  '  ' 


Defining  f(D)  =  J  P{D\0)p{0)  dO,  the  ratio  becomes 

f(Di  U  £2) 

f(Di)f(D2)  ' 

As  shown  by  Cooper  and  Herskovits  [28]  and  using  the  notation  used  in  Appendix  E, 


(14) 


n  qz 

m  =  II  II 


r(ayo) 


j-j  r(o 'ijk  +  Nijk) 


i-i  j-i  i  +  Nij)  fc=1  r(o^fc) 


(15) 


where  a  is  the  prior  and  F  is  the  standard  gamma  function.  The  natural  logarithm  of  the  ratio  in 
Equation  13  is  used  to  compare  the  feature  distributions  from  the  different  data  sets.  The  more 
positive  the  log-ratio  is,  the  more  likely  the  distributions  are  the  same.  Negative  log-ratios  indicate 
that  the  distributions  are  different.  Because  this  metric  is  not  intuitive,  the  maximum  log-ratios 
are  also  presented;  this  maximum  is  simply  the  CONUS  model  compared  against  itself.  Positive 
log-ratios  near  this  maximum  indicate  greater  similarity  than  positive  log-ratios  near  to  zero. 
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3.  UNCORRELATED  ENCOUNTER  MODEL 


Figures  4  and  5  illustrate  the  optimal  initial  and  transition  network  structures  for  the  littoral 
uncorrelated  model,  respectively.  Note  that  the  optimal  littoral  initial  network  is  identical  to  that 
for  the  CONUS  model  while  the  transition  networks  differ  (see  Appendix  D.l  for  the  network 
candidates  considered).  The  sole  difference  is  that  the  initial  airspeed  (t>)  is  not  a  parent  variable 
in  the  littoral  transition  network.  This  reduces  the  transition  network  complexity  such  than  the 
number  of  parameters  is  reduced  from  22,344  for  the  CONUS  model  to  6972  for  the  littoral  model. 


Figure  Bayesian  network  structure  representing  the  initial  conditions  for  the  littoral  uncorrelated  model. 


3. 1  CONVERGENCE 

The  model  parameter  convergence  as  discussed  in  Section  2.2.1  for  the  initial  network  structure  is 
shown  in  Figure  6.  Both  the  CONUS  and  littoral  models  display  approximately  the  same  general 
behavior,  although  for  all  iterations,  the  normalized  mean  convergence  is  lower  for  the  CONUS 
model.  This  is  expected  due  to  the  significantly  larger  amount  of  data  available  for  the  CONUS 
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Figure  5.  Transition  network  structure  for  the  littoral  uncorrelated  model.  The  transition  network  propagates 
the  initial  conditions  during  the  encounter  and  is  updated  at  one  second  intervals . 

model  which  results  in  a  reduced  sampling  error.  Because  of  the  similar  behavior  (the  same  ob¬ 
served  ratio  between  the  littoral  and  CONUS  values),  the  mean  of  the  ratio  between  the  two  mean 
convergence  is  computed.  This  results  in  an  estimate  of  the  relative  convergence  between  the  two 
models.  The  mean  of  the  ratios  between  the  CONUS  and  littoral  convergences  shown  in  this  fig¬ 
ure  is  3.83,  indicating  that  the  littoral  model  parameters  converged  to  about  26%  of  that  for  the 
CONUS  model. 


Figure  6.  Uncoirelated  model  initial  parameter  convergence  for  the  CONUS  and  littoral  models. 


Figure  7  shows  the  normalized  mean  convergence  for  the  transition  network.  The  mean  of  the  ratios 
between  the  two  models  is  reduced  from  3.82  for  the  initial  network  to  2.39  for  the  transition  net- 
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work.  This  slight  reduction  is  caused  by  an  associated  reduction  in  the  total  number  of  parameters 
that  must  be  estimated  in  the  littoral  transition  network. 


Figure  7.  Uncorrelated  model  transition  network  parameter  convergence  for  the  CONUS  and  littoral  models. 


From  these  two  convergence  illustrations,  there  is  an  observed  relationship  between  the  normalized 
mean  convergence  and  the  total  number  of  estimated  parameters  together  with  the  number  of 
observations  used  to  build  the  model.  This  relationship,  the  mean  ratio  between  the  normalized 
mean  convergences  for  the  CONUS  and  littoral  models,  is  observed  to  be  approximately 

^  /  AT conus  ^littoral 
V  -^littoral  Tl  conus 

where  N  is  the  total  number  of  observations  (equivalent  to  Nt  for  a  model  without  missing  data),  n 
is  the  the  total  number  of  parameters  in  the  model,  and  £  is  a  correction  constant.  This  correction 
factor  is  introduced  because  the  expression  does  not  consider  the  complex  model  interdependences 
and  how  the  data  affects  the  selection  of  the  optimal  network.  From  the  data,  ^  is  1.34  for  the 
initial  network  and  1.2  for  the  transition  network,  indicating  strong  confidence  in  this  relationship 
between  the  convergence  metric  and  model’s  number  of  parameters  and  observations. 

Although  the  normalized  mean  convergence  allows  for  the  quantitative  comparison  between  the 
parameter  convergence  of  the  two  models,  the  P(nmac  |  enc)  convergence  provides  a  metric  which 
demonstrates  how  the  quantity  of  data  together  with  the  optimal  network  structure  will  affect 
the  specific  quantity  of  interest.  This  P(nmac  |  enc)  estimate  using  the  procedure  discussed  in 
Section  2.2.2  for  both  the  CONUS  and  the  littoral  models  is  shown  in  Figure  8.  From  visual 
inspection,  it  was  determined  that  the  CONUS  model  converged  at  approximately  18  iterations, 
while  the  littoral  model  converged  at  26  iterations.  This  aligns  with  the  1%  threshold  when  assessing 
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the  percent  deviation  from  the  final  P(mnac  |  enc)  estimate — both  the  littoral  and  CONUS  model 
converged  to  within  1%  at  18  iterations.  The  CONUS  model  converged  to  within  5%  of  the  final 
estimate  immediately  (at  the  first  iteration)  while  the  littoral  model  converged  at  iteration  12.  The 
theoretical  P(nmac  |  enc)  should  be  1.6  ■  10  5  under  the  assumptions  presented  in  Section  2.2.2. 
The  P(nmac  |  enc)  in  the  figure  may  not  exactly  match  because  of  sampling  error,  and  the  model 
characteristics  (e.g.,  acceleration,  turn  rates)  that  were  not  considered  in  the  extensive  assumptions 
used  for  the  1.6  •  10  *  approximation. 


•1CT3 


Figure  8.  Uncorrelated  P(nmac  \  enc)  convergence  for  the  CONUS  and  littoral  models. 


3.2  MODEL  COMPARISON 

Figure  9  illustrates  the  comparison  of  the  marginal  discrete  distributions  for  the  model  variables. 
Superimposed  is  the  p- value  calculated  from  Equation  12  using  the  standard  x2  probability  dis¬ 
tribution  with  K  —  1  degrees  of  freedom.  The  p- value  is  the  probability  of  obtaining  the  sample 
(littoral)  distribution  or  one  more  extreme  by  chance  alone;  a  p- value  near  zero  indicates  little  con¬ 
fidence  that  the  two  distributions  are  equivalent.  The  marginal  distributions  for  the  non-geographic 
variables,  including  airspeed,  acceleration,  vertical  rate,  and  turn  rate,  were  marginalized  assuming 
an  objective  (uniform)  distribution  over  airspace  class  and  layer  (Eq.  8)  while  airspace  class  and 
altitude  layer  distributions  are  the  standard  marginal  distributions  (Eq.  7). 

The  p-value  results  for  airspace  class,  vertical  rate,  and  acceleration  are  close  to  but  below  the 
commonly  used  1%  threshold.  This  indicates  that  the  distributions  are  not  equivalent  according  to 
the  threshold,  but  the  deviances  observed  for  these  variables  are  relatively  minor  compared  to  the 
airspeed  and  altitude  layer  deviances.  The  hypothesis  tests  for  airspeed,  altitude  layer  and  turn 
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Figure  9.  Uncorrelated  relative  frequency  feature  comparison  with  the  modified  \2  p-value  superimposed, 
denoted  p  (Eq.  12). 


17 


rate  demonstrate  the  results  with  high  confidence,  defined  as  a  p-value  very  near  zero  or  greater 
than  10%.  The  test  indicates  that  the  turn  rate  distributions  are  equivalent  while  the  airspeed 
and  altitude  layer  distributions  are  different.  In  general,  the  marginal  distribution  comparison 
illustrates  that  littoral  1200-code  aircraft  have  higher  airspeeds  and  travel  at  higher  altitudes  than 
their  CONUS  equivalent. 

Table  2  shows  the  Bayesian  network  similarity  score  for  the  initial  and  transition  networks.  Also 
shown  is  the  maximum  possible  value  which  is  found  by  comparing  the  CONUS  model  to  itself. 
Unfortunately,  this  metric  can  only  be  computed  when  the  network  structures  are  the  same  for 
both  of  the  models  being  compared.  The  optimal  transition  network  structures  are  different  for  the 
CONUS  and  littoral  models,  thus  the  littoral  transition  network  was  formed  to  match  that  for  the 
CONUS  model.  This  test,  discussed  in  Section  2.3.2,  indicates  that  the  initial  network  parameters 
are  different  between  the  models  with  very  high  confidence,  but  that  the  transition  networks  are 
similar.  From  Figure  9,  the  littoral  altitude  layer  distribution  was  statistically  different  than  for 
the  CONUS  model.  Therefore,  the  Bayesian  similarity  score  test  was  also  performed  for  each 
instantiation  of  altitude  layer  and  airspace  class,  16  in  total,  and  consistent  results  to  those  in 
Table  2  were  observed  for  each  instantiation — all  initial  network  similarity  scores  were  negative 
while  all  transition  network  scores  were  positive. 


TABLE  2 


Uncorrelated  model  similarity  scores. 


Network 

Score 

Maximum  Score 

Initial 

-959,784 

70,831 

Transition 

41,117 

86,122 
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4.  CORRELATED  ENCOUNTER  MODEL 


Figures  10  and  11  show  the  optimal  initial  and  transition  network  structures  for  the  littoral  cor¬ 
related  model,  respectively.  Similar  to  the  littoral  uncorrelated  model,  the  optimal  initial  network 
is  the  same  for  the  CONUS  and  littoral  models  while  the  optimal  transition  networks  differ  (see 
Appendix  D.2  for  the  network  candidates  considered).  The  altitude  layer  variable  (L)  is  not  a 
parent  variable  in  the  littoral  correlated  transition  network  while  it  is  in  the  CONUS  model.  The 
absence  of  the  altitude  layer  as  a  parent  variable  reduced  the  number  of  parameters  in  the  transition 
network  from  8100  for  the  CONUS  model  to  324  for  the  littoral  model,  a  25  fold  reduction. 


Figure  10. 


Bayesian  network  sti'ucture  representing  the  initial  conditions  for  the  littoral  correlated  model. 


4.1  CONVERGENCE 

Figure  12  illustrates  the  normalized  mean  convergence  for  the  initial  network.  The  mean  ratio  of 
the  normalized  mean  convergence  for  the  littoral  model  normalized  by  that  for  the  CONUS  model 
was  2.99,  indicating  that  the  littoral  model  converged  to  33%  of  that  for  the  CONUS  model.  The 
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Figure  11.  [ Transition  network  structure  for  the  littoral  correlated  model. 

relationship  of  Equation  16  was  found  to  he  equally  valid  for  the  correlated  initial  network,  with 
£  being  1.06.  The  normalized  mean  convergence  in  Figure  12  demonstrates  convergence  behavior, 
as  opposed  to  remaining  constant,  indicating  that  the  individual  data  subsets  do  not  encompass 
the  complete  diversity  of  encounters.  Hence,  several  iterations  are  required  before  the  normalized 
mean  convergence  metric  approaches  steady  state. 


Figure  12.  Correlated  model  initial  parameter  convergence  for  the  CONUS  and  littoral  models. 
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Figure  13  illustrates  the  convergence  for  the  transition  network.  Contrary  to  the  convergence 
for  the  previous  networks  discussed,  the  normalized  mean  convergence  is  lower  in  general  for  the 
littoral  model  relative  to  the  CONUS  model.  This  is  indicated  by  the  mean  ratio  of  the  normalized 
convergence  being  0.93  while  it  was  greater  than  two  for  the  other  comparisons.  This  behavior 
is  caused  by  the  littoral  correlated  transition  network  having  25  times  fewer  parameters  than  the 
CONUS  network  while  approximately  10%  of  the  CONUS  data  was  used  for  the  littoral  model. 
The  correction  factor  £  for  this  case  was  0.69,  which  is  lower  than  that  for  the  other  convergence 
curves.  This  result  indicates  that  Equation  16  with  £  set  to  unity  is  only  an  approximation  of  the 
benefit  of  adding  data  to  the  encounter  models  using  the  normalized  mean  convergence  metric. 
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Figure  13.  Correlated  model  transition  network  parameter  convergence  for  the  CONUS  and  littoral  models. 


Although  there  is  confidence  that  the  littoral  transition  network  parameters  converged,  there  is  still 
interest  in  the  P(nmac  |  enc)  convergence.  Similar  to  the  discrete  marginal  distribution  comparison, 
a  uniform  marginalization  over  A  and  L  was  assumed  to  remove  the  effects  of  differing  geographic 
distributions.  This  is  illustrated  in  Figure  14.  The  CONUS  model  was  determined  (by  visual 
inspection)  to  converge  quite  quickly,  after  about  12  iterations,  while  the  NMAC  probability  for 
the  littoral  model  was  determined  to  converge  after  about  28  iterations. 

The  convergence  curve’s  percent  deviation  from  the  final  P(mnac  |  enc)  is  illustrated  in  Figure  15. 
The  littoral  and  CONUS  model  convergence  falls  within  the  5%  threshold  at  31  and  20  iterations, 
respectively,  while  neither  model’s  convergence  falls  and  remains  within  the  1%  threshold.  The 
littoral  model  would  converge  to  below  5%  at  27  iterations,  but  the  percent  deviation  increases  to 
5.5%  at  iteration  30.  The  result  that  the  models  do  not  converge  to  within  1%  indicates  that  this 
arbitrary  convergence  threshold  cannot  be  used  independant  of  the  general  convergence  behavior 
since  it  was  assumed  that  the  CONUS  model  converged. 
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Figure  14 •  Correlated  P(mnac  |  enc)  convergence  for  the  CONUS  and  littoral  models. 


Figure  15.  Correlated  P(nmac  |  enc)  convergence  expressed  as  the  difference  from  the  final  estimate.  The 
absolute  value  is  presented,  and  the  percent  deviation  is  truncated  at  100%.  Doing  so  removes  the  CONUS 
rnodeFs  first  iteration  point  which  is  at  approximately  250%. 


4.2  MODEL  COMPARISON 

Similar  the  the  comparison  process  for  the  uncorrelated  model,  the  marginal  distributions  for  the 
variables  in  the  initial  network  are  first  compared,  followed  by  the  Bayesian  network  similarity 
score.  For  the  purpose  of  the  marginal  comparison,  the  variables  for  both  aircraft  were  combined 
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into  one  marginal  distribution.  The  geographic  variables,  along  with  the  airspeed,  acceleration, 
vertical  rate,  and  turn  rate  are  shown  in  Figure  16.  The  geographic  variables  (i.e.,  altitude  layer 
and  airspace  class)  were  determined  to  have  different  distributions  with  high  confidence  while  the 
other  variables  were  determined  to  have  similar  distributions.  The  airspeed  distribution  has  the 
largest  discrepancy  of  these  non-geographic  variables,  which  is  indicated  by  the  relatively  low  p- 
value.  In  general,  it  was  observed  that  more  littoral  encounters  occur  in  the  lowrer  altitude  band,  and 
that  aircraft  in  littoral  encounters  tend  to  have  hnver  vertical  rates  and  turn  rates  when  compared 
with  CONUS  encounters. 

Figure  17  illustrates  the  marginal  comparison  for  the  variables  that  are  defined  at  the  closest  point 
of  approach  termed  the  geometric  variables.  The  hypothesis  test  indicates  that  all  of  the  feature 
distributions  are  nearly  the  same,  except  for  the  the  approach  angle  distribution.  The  aircraft 
category  hypothesis  test  indicates  that  the  distributions  are  not  equivalent,  but  only  nearly  so  with 
a  p- value  of  0.99%. 

Table  3  shows  the  Bayesian  similarity  scores.  Both  the  initial  and  transition  networks  for  the  littoral 
model  were  determined  to  be  similar  to  the  CONUS  model  although  the  littoral  transition  network 
corresponded  better  to  the  CONUS  model  than  did  the  initial  network.  Similar  to  the  methodology 
for  the  uncorrelated  model,  the  individual  parental  instantiations  were  also  compared.  The  CONUS 
and  littoral  models  agreed  (all  log-ratios  were  positive)  except  for  the  single  instantiation  of  airspace 
classified  as  “other”  and  the  lower  altitude  layer  for  the  initial  network.  This  instance  may  be  caused 
by  differing  airspace  procedures  over  the  oceanic  airspace  at  lower  altitudes  away  from  terminal 
areas. 


TABLE  3 


Correlated 

model 

similarity  scores. 

Network 

Score 

Maximum  Score 

Initial 

1705 

20,413 

Transition 

20,910 

36,467 
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Figure  16.  Correlated  relative  frequency  feature  comparison. 
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Figure  17.  Correlated  relative  frequency  feature  variable  comparison  for  variables  defined  at  CPA . 
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5.  SUMMARY 


The  purpose  of  this  analysis  was  to  build  a  correlated  and  an  uncorrelated  conventional  encounter 
model  for  the  littoral  regions  of  the  National  Airspace  System.  Recently,  an  exhaustive  set  of 
encounter  models  was  built  for  the  continental  United  States,  encompassing  conventional  and  un¬ 
conventional  aircraft  in  correlated  and  uncorrelated  encounters.  Although  these  CONUS  models 
captured  encounters  in  the  littoral  region  surrounding  the  CONUS,  they  did  not  explicitly  classify 
the  encounters  into  the  littoral  and  non-littoral  regions.  Hence,  the  nine  months  of  radar  data  used 
to  build  the  original  models  were  filtered  to  build  these  models  specific  to  the  littoral  region.  A 
littoral  unconventional  model  was  not  built  because  of  the  very  low  occurrence  of  aircraft  classified 
as  unconventional  (e.g.,  gliders,  balloons)  that  were  observed  in  the  littoral  airspace. 

The  littoral  region  was  broadly  defined  as  the  region  where  a  maritime  UAS  would  be  likely  to 
operate.  Specifically,  the  littoral  region  was  defined  with  respect  to  the  shoreline,  and  an  additional 
buffer  of  approximately  1  NM  was  added  oceanward  to  remove  any  traffic  that  may  be  identified  as 
littoral  when  transiting  small  oceanic  inlets  or  bays.  After  filtering,  fewer  than  10%  of  the  data  were 
determined  to  be  oceanic  10%  of  encounters  and  3.8%  of  1200-code  track  data  were  appropriate 
for  building  the  correlated  and  uncorrelated  models,  respectively.  Because  of  this  relatively  small 
quantity  of  data,  one  of  the  primary  concerns  was  the  convergence  of  the  model.  In  addition,  the 
CONUS  and  littoral  models  were  directly  compared  to  ensure  that  the  littoral  and  CONUS  models 
were  statistically  and  realistically  different. 

In  order  to  determine  if  the  models  converged,  two  metrics  were  formed.  First,  the  change  in  the 
model  parameters  as  data  were  added  was  assessed.  Second,  the  primary  safety  analysis  metric  of 
interest  -P(n mac  |  enc)—  was  estimated  as  addition  data  were  added.  These  convergence  metrics 
were  directly  compared  to  those  of  the  CONUS  models.  To  determine  whether  the  models  were 
equivalent,  the  marginal  discrete  feature  distributions  were  compared  using  a  modified  \ 2  similarity 
metric.  Since  this  test  was  not  able  to  capture  the  important  dependencies  in  the  Bayesian  network 
models,  the  littoral  and  CONUS  model  were  also  compared  using  what  was  termed  the  Bayesian 
network  similarity  score  which  directly  compared  the  model  parameters. 

The  primary  conclusions  of  this  work  were: 

1.  Both  the  littoral  uncorrelated  and  correlated  models  converged.  This  was  determined  primar¬ 
ily  through  analysis  of  the  P(nmac  |  enc)  as  additional  data  were  added  since  in  all  but  one 
case,  the  littoral  model  parameters  did  not  converge  to  the  level  of  the  CONUS  model.  In 
the  other  case,  the  littoral  model  converged  to  a  greater  extent  because  there  were  25  times 
fewer  parameters  to  be  estimated  in  the  littoral  optimal  model  than  for  the  CONUS  optimal 
model.  For  both  the  correlated  and  uncorrelated  models,  the  P(nmac  |  enc)  converged  much 
faster  for  the  CONUS  model.  The  CONUS  model  required  18  and  12  of  the  32  iterations 
to  converge  for  the  uncorrelated  and  correlated  model,  respectively,  while  the  littoral  model 
required  26  and  28  iterations,  respectively. 

2.  It  was  determined  that  neither  littoral  model  was  significantly  similar  to  its  CONUS  coun¬ 
terpart  to  warrant  not  using  the  littoral  models. 
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(a)  For  the  uncorrelated  models,  the  littoral  initial  network  was  found  to  be  significantly 
dissimilar  from  the  CONUS  initial  network  when  using  both  the  marginal  distribution 
test  and  the  Bayesian  network  similarity  score.  Specifically,  the  airspeed  and  altitude 
layer  distribution  varied  greatly  between  the  littoral  and  CONUS  uncorrelated  initial 
networks.  The  littoral  and  CONUS  correlated  model  initial  networks  were  found  to  be 
similar  using  the  Bayesian  network  similarity  score,  although  the  geographic  variables 
(airspace  class  and  altitude  layer)  and  the  approach  angle  at  closest  point  of  approach 
(CPA)  were  found  to  be  dissimilar. 

(b)  The  transition  networks  for  both  the  littoral  correlated  and  uncorrelated  models  were 
found  to  be  similar  to  the  CONUS  models  using  the  Bayesian  network  similarity  score. 
This  was  true  even  though  the  optimal  littoral  transition  network  was  not  used  because 
the  scoring  metric  required  the  same  network  structure  for  the  models  being  compared. 
For  both  the  correlated  and  uncorrelated  models,  the  littoral  and  CONUS  initial  net¬ 
works  were  found  to  be  less  similar  than  the  transition  networks.  This  discrepancy 
is  thought  to  be  caused  by  underlying  differences  in  the  airspace  affecting  the  initial 
network,  while  the  pilot’s  behavior,  which  would  affect  the  transition  network,  remains 
similar. 


5.1  FUTURE  WORK 

A  comprehensive  safety  analysis  requires  the  characterization  of  both  the  encounter  attributes  (as 
was  accomplished  in  this  document)  as  well  as  the  airspace  density.  The  NMAC  rate  in  a  given 
airspace  is  the  combination  of  the  expected  encounter  rate  and  the  probability  that  an  encounter 
event  leads  to  a  NMAC.  The  expected  encounter  rate  is  directly  proportional  to  the  average  airspace 
density. 

It  is  not  trivial  to  characterize  the  noncooperative  traffic  density  because  primary  radar  returns 
consist  of  clutter  and  birds  in  addition  to  the  noncooperative  traffic  of  interest.  Furthermore,  many 
ATC  radars  cannot  estimate  altitude  (other  than  ARSR-4  sensors  on  the  exterior  of  the  CONUS), 
so  either  altitude  cannot  be  considered  or  it  must  be  estimated  using  a  technique  called  multian- 
gulation.  Lincoln  Laboratory  has  completed  a  preliminary  study  classifying  noncooperative  tracks 
into  birds  and  aircraft  [29]  and  has  also  developed  preliminary  algorithms  for  multiangulation  using 
a  network  of  two-dimensional  ATC  radars  [30]  It  is  therefore  recommended  that  a  methodology 
for  characterizing  the  littoral  noncooperative  density  be  established. 

Encounter  characteristics  and  traffic  density  are  expected  to  change  as  the  Federal  Aviation  Administration 
implements  the  Next  Generation  Air  Transportation  System  (NextGen),  commercial  air  traffic  den¬ 
sity  increases,  and  newer  vehicles  are  introduced  into  the  NAS  (e.g.,  UASs,  very-light  jets).  It  is 
important  that  the  encounter  and  airspace  density  models  are  kept  current  throughout  the  devel¬ 
opment  and  certification  of  the  UAS  SAA  systems.  This  ensures  that  conclusions  derived  from 
the  models  are  valid  when  the  system  reaches  initial  operating  capability.  This  process  includes 
capturing  recent  airspace  data,  monitoring  the  airspace  characteristics  for  changes,  and  updating 
the  encounter  models  as  necessary. 
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The  I)HS  UAS  operational  environment  may  extend  beyond  the  limits  of  CONUS-based  ground 
radars.  Therefore,  it  is  important  to  characterize  the  oceanic  environment  beyond  the  range  of 
these  sensors — e.g.,  the  Carribean  Sea,  northern  coast  of  South  America.  Data  sources  and  a 
methodology  for  ut  ilizing  these  sensors  should  be  established  to  create  encounter  and  density  models 
representative  of  the  expected  oceanic  operating  environment. 
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APPENDIX  A 

CONTINUOUS  FEATURE  COMPARISON 


The  following  sections  display  the  continuous  model  feature  distributions  that  were  discretized 
when  building  the  models  (except  for  variables  that  are  fundamentally  discrete,  such  as  airspace 
class).  The  distributions  are  normalized  such  that  the  integral  over  the  feature  limits  for  both  the 
CONUS  and  littoral  models  are  equal. 

A.l  UNCORRELATED  MODEL 


Vertical  Rate  (ft/m in) 

Figure  A-l.  Aircraft  vertical  rate  in  uncorrelated  encounters  magnified  for  detailed  view  of  distribution  tails. 
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Figure  A-2.  Uncorrelated  relative  frequency  feature  comparison  for  continuous  variables.  Of  note  are  the  low 
period  altitude  oscillations  which  are  caused  by  altitude  quantization  while  the  longer  period  littoral  altitude 
oscillations  correspond  with  standard  VFR  cruising  altitudes — these  are  not  present  in  the  CONUS  data  due 
to  terrain  variation. 
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A. 2  CORRELATED  MODEL 


0  100  200  300  400  500  600  -1.5  -I  -0.5  0  0.5  1  1.5 


Airspeed  (kt) 


Acceleration  (kt/s) 
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Vertical  Rate  (ft/min)  Turn  Rate  (deg/s) 

Figure  AS.  Correlated  relative  frequency  feature  comparison  for  continuous  variables.  The  MSL  altitude  is  the 
altitude  of  the  higher  aircraft  at  CPA. 
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Figure  A- 4 .  Correlated  relative  frequency  feature  comparison  for  continuous  variables  defined  at  the  closest 
point  of  approach . 
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Vertical  Rate  (ft /min) 

Figure  A-5.  Aircraft  vertical  rate  in  correlated  encounters  magnified  for  detailed  view  of  distribution  tails. 
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APPENDIX  B 
MODEL  VARIABLES 


The  model  variables  are  reprinted  from  the  CONUS  model  documentation  for  reference. 

B.l  UNCORRELATED  MODEL 

There  are  six  variables  in  the  uncorrelated  encounter  model: 

•  Airspace  class  A:  The  airspace  class  variable  was  incorporated  into  the  model  to  account 
for  the  variation  in  how  aircraft  fly  in  different  airspace  classes.  This  variable  may  take  on  one 
of  four  values:  B,  C,  D,  and  O,  indicating  which  class  of  airspace  the  aircraft  is  in.  The  values 
B,  C,  and  D  correspond  to  the  controlled  airspace  classes  defined  by  the  FA  A.  The  value  O 
represents  “other  airspace,”  including  airspace  Classes  E  and  G.  Note  that  there  should  be 
no  VFR  aircraft  in  Class  A  due  to  the  requirement  that  aircraft  in  that  Class  of  airspace  fly 
under  Instrument  Flight  Rules. 

•  Altitude  layer  L:  Airspace  is  divided  into  four  altitude  layers,  in  a  process  similar  to 
prior  encounter  models  developed  by  Eurocontrol.  The  first  layer  spans  from  500  to  1200  ft 
Above  Ground  Level  (AGL)  to  capture  aircraft  in  the  traffic  pattern  or  performing  low- 
level  maneuvers.  The  second  layer  spans  a  transition  zone  from  1200  to  3000  ft  AGL,  the 
cruise  altitude  where  the  hemispheric  rule  begins.  The  third  layer  spans  from  3000  ft,  AGL 
to  5000  ft  AGL  covering  a  mix  of  low-altitude  enroute  and  maneuvering  aircraft.  The  fourth 
layer  includes  airspace  above  5000  ft  AGL  and  would  cover  most  enroute  VFR  traffic. 

•  Airspeed  v:  We  model  true  airspeed  and  allow  it  to  vary  during  flight. 

•  Acceleration  v:  We  allow  airspeed  acceleration  to  vary  every  second,  providing  higher- 
fidelity  motion  than  prior  models. 

•  Turn  rate  0:  Turn  rate  is  permitted  to  change  every  second  in  our  model.  The  prior 
European  and  ICAO  cooperative  models  allowed  only  a  single  turn  during  an  encounter. 

•  Vertical  rate  h:  The  vertical  rate  is  permitted  to  change  at  every  second.  All  prior  cooper¬ 
ative  models  allowed  only  a  single  vertical  acceleration  period  during  an  encounter. 

B.2  CORRELATED  MODEL 

The  aircraft  at  the  higher  altitude  at  TCA  is  called  ACl.  The  other  aircraft  is  called  AC2.  The 
remainder  of  this  section  explains  how  we  model  the  relationship  between  the  behavior  of  these  two 
aircraft.  We  model  the  following  variables  to  describe  each  encounter: 

•  Vertical  Miss  Distance  vrnd:  Vertical  miss  distance  is  defined  as  the  difference  in  altitude 
between  the  two  aircraft  at  the  point  of  closest  approach  (point  of  minimum  horizontal  miss 
distance). 
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•  Horizontal  Miss  Distance  hmd:  Horizontal  miss  distance  is  defined  as  the  horizontal  range 
between  the  two  aircraft  at  the  point  of  closest  approach  (point  of  minimum  horizontal  miss 
distance). 

•  Airspace  class  A:  This  variable  may  take  on  one  of  four  values:  B,  C,  D,  and  0,  indicating 
which  class  of  airspace  the  encounter  is  in.  The  values  B,  C,  and  D  correspond  to  the  controlled 
airspace  classes  defined  by  the  FAA.  The  value  0  represents  “other  airspace,”  which  includes 
Class  A,  E,  and  G  airspace.  The  airspace  class  variable  was  incorporated  into  the  model  to 
account  for  the  variation  in  how  aircraft  fly  in  different  airspace  classes.  Note  that  Class  A 
can  be  distinguished  from  Classes  E  and  G  by  referring  to  the  next  variable:  altitude  layer. 

•  Altitude  layer  L:  Airspace  is  divided  into  five  altitude  layers,  similar  to  prior  encounter 
models  developed  by  Eurocontrol.  The  first  layer  spans  from  1000  to  3000ft  Above  Ground 
Level  (AGL)  to  capture  encounters  in  the  traffic  pattern.  Encounters  that  occur  below 
1000  ft  AGL  are  filtered  out;  TCAS,  for  example,  will  not  issue  resolution  advisories  for  en¬ 
counters  occurring  below  1000  ft  AGL.  The  second  layer  spans  from  3000  ft  AGL  to  10,000  ft 
Mean  Sea  Level  (MSL),  the  upper  limit  for  aircraft  without  transponders  and  the  250 kt  air¬ 
speed  restriction.  The  third  layer  spans  from  10,000  ft  MSL  to  FL180,  the  upper  limit  for 
VFR  traffic  and  the  beginning  of  Class  A  airspace.  The  fourth  layer  spans  from  FL180  to 
FL290,  the  beginning  of  the  Reduced  Vertical  Separation  Minimum  (RVSM).  The  fifth  layer 
includes  all  airspace  above  FL290.  The  altitude  layer  for  an  encounter  is  determined  by  the 
altitude  of  AC1  at  TCA. 

•  Approach  Angle  ft:  Approach  angle  is  the  heading  of  AC2  relative  to  AC1  at  TCA.  Fig¬ 
ure  B-l  shows  how  j3  is  calculated.  The  European  encounter  models  also  used  this  definition. 

•  Bearing  Figure  B-l  shows  how  the  bearing  of  AC2  relative  to  ACl  is  calculated  at  TCA. 
Given  /?,  /mid,  and  wc  can  uniquely  identify  the  lateral  position  and  orientation  of  AC2 
relative  to  ACl  at  TCA. 

•  Category  C\  and  C2:  We  currently  divide  aircraft  into  two  categories:  1200-code  aircraft 
and  discrete-code  aircraft.  Compared  to  discrete-code  aircraft,  aircraft  squawking  1200  tend 
to  accelerate  more  frequently,  fly  at  lower  altitudes  and  at  lower  speeds,  and  are  often  smaller 
aircraft. 

•  Initial  Airspeed  v\  and  V2 :  We  model  initial  airspeeds  of  the  two  aircraft.  We  assume  zero 
wind  since  aircraft  close  enough  to  be  in  an  encounter  situation  are  most  likely  within  the 
same  air  mass  and  experiencing  approximately  the  same  windfield. 

•  Acceleration  v\  and  U2:  The  model  assumes  constant  airspeed  acceleration  for  the  duration 
of  the  encounter  as  was  the  case  in  the  prior  European  encounter  models.  This  is  a  reasonable 
approximation  given  the  short  50  s  duration  of  each  encounter. 

•  Turn  rate  and  1P2:  Turn  rate  is  permitted  to  change  every  second  in  the  model.  The 
prior  European  and  ICAO  models  allowed  only  a  single  turn  during  an  encounter. 

•  Vertical  rate  h\  and  I12:  The  vertical  rate  is  permitted  to  change  at  every  second.  All  prior 
cooperative  models  allowed  only  a  single  vertical  acceleration  period  during  an  encounter. 
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Figure  B-l.  Approach  angle  ((3)  and  bearing  (\)  definition. 
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APPENDIX  C 

UNCORRELATED  MODEL  ADDENDUM 


Section  6.1  of  the  un  cor  related  model  documentation  described  the  process  for  estimating  the 
NMAC  probability  [7].  The  method  for  estimating  the  NMAC  probability  in  this  section  should  be 
used  in  its  place.  The  process  for  initializing  uncorrelated  encounters  is  reprinted  here  for  reference 
followed  a  description  of  the  modifications  to  the  NMAC  probability  calculation. 

C.l  ENCOUNTER  INITIALIZATION 

We  use  rejection  sampling  to  generate  the  initial  conditions  of  an  encounter.  Rejection  sampling 
involves  proposing  a  series  of  candidate  samples  from  a  random  distribution  until  choosing  one  that 
meets  a  set  of  criteria.  The  process  we  use  for  generating  initial  conditions  for  encounters  is  as 
follows: 

1.  Generate  airspeeds,  vertical  rates,  turn  rates,  and  accelerations  for  AC1  and  AC2  according  to 
their  models  such  that  they  belong  to  the  same  airspace  class  and  altitude  layer.  Forcing  this 
constraint  can  be  done  using  rejection  sampling.  Simply  generate  AC1  and  AC2  independently 
and  reject  both  if  they  have  a  different  airspace  class  or  altitude  layer.1  AC1  and  AC2  are 
termed  the  own  and  intruder  aircraft,  respectively. 

2.  Initialize  the  position  of  AC2  on  the  surface  of  the  encounter  cylinder  centered  on  AC1. 
AC2  may  be  initialized  on  the  top,  bottom,  or  side  surfaces  of  the  encounter  cylinder.  The 
probability  of  being  located  on  the  top,  bottom,  or  side  is  proportional  to  the  volume  swept 
out  by  each  encounter  cylinder  surface.  Once  top,  bottom,  or  side  has  been  selected.  AC2  is 
randomly  positioned  using  a  two-dimensional  uniform  distribution  across  that  surface.  The 
bearing  of  AC2  relative  to  AC1  is  denoted  x- 

3.  The  heading  of  ACl  is  set  to  zero  for  simplicity.  The  heading  of  AC2,  denoted  is  randomly 
selected  from  a  uniform  distribution  over  ’7r,7r). 

4.  If  AC2  was  initialized  on  the  top  of  the  encounter  cylinder,  accept  the  sample  if  the  vertical 
rate  of  AC2  relative  to  ACl,  denoted  prv,  is  negative.  This  ensures  that  AC2  is  penetrating 
the  encounter  cylinder  for  the  first  time. 

5.  If  AC2  was  initialized  on  the  bottom  of  the  encounter  cylinder,  accept  the  sample  if  the 
vertical  rate  of  AC2  relative  to  ACl,  denoted  i>rv,  is  positive.  This  ensures  that  AC2  is 
penetrating  the  encounter  cylinder  for  the  first  time. 

6.  If  AC2  was  initialized  on  the  side  of  the  encounter  cylinder,  accept  the  sample  if  R  •  vr  h 
is  liegative,  where  R  =  (sin  x,  cos  x)  is  the  representation  of  the  horizontal  unit  vector  from 
ACl  to  AC2  in  the  (East,  North)  coordinate  frame  and  vr  ^  is  v2  —  vj.  The  vectors  V]  and  v2 

]Note  that  this  will  result  in  an  incorrect  distribution  over  airspace  class  and  layer  (compared  to  that  observed). 
Reference  Section  C.4  for  the  proper  correction. 
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are  the  horizontal  ground  velocities  of  AC1  and  AC2  respectively.  When  R*  v^h  is  negative, 
the  relative  velocity  of  AC2  with  respect  to  AC1  is  into  the  encounter  cylinder,  and  therefore 
the  encounter  should  be  accepted. 

The  process  is  repeated  until  a  candidate  initialization  is  accepted.  A  byproduct  of  rejection 
sampling  is  that  the  intruder  bearing  distribution  is  nonuniform.  As  one  would  expect,  more 
encounters  occur  head  on  than  from  the  side  or  rear.  Figure  C-l  illustrates  the  initial  horizontal 
variables  for  uncorrelated  encounters  at  the  instant  when  the  intruder  aircraft  penetrates  the  side 
of  the  encounter  cylinder. 


Figure  C-L  Horizontal  plane  encounter  initialization. 


C.2  ESTIMATING  NMAC  PROBABILITY 

This  initialization  procedure  is  the  result  of  the  fundamental  assumption  that  the  two  aircraft  blun¬ 
der  into  close  proximity.  The  penetration  angle  (0)  represents  the  scalar  projection  of  the  relative 
horizontal  velocity  (vr  h)  onto  the  radial  vector  (R)  and  is  a  function  of  the  initial  intruder  bearing, 
intruder  heading,  and  the  initial  horizontal  airspeeds  of  the  aircraft.  An  encounter  is  defined  only  if 
the  intruder  penetrates  the  encounter  cylinder — <fr  must  be  on  the  interval  (— 7i/2,  7t/2).  Therefore, 
the  encounter  depicted  by  Figure  C-l  would  be  rejected  and  a  new  encounter  would  need  to  be 
initialized.  A  similar  procedure  is  required  for  intruders  that  penetrate  the  top  or  bottom  of  the 
encounter  cylinder. 
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Ideally,  the  encounter  parameters  would  be  sampled  according  to  the  appropriate  probability  dis¬ 
tributions  describing  the  expected  encounter  geometry.  If  encounters  were  sampled  according  to 
the  appropriate  distributions,  then  the  NMAC  probability  estimate  would  simply  be  the  number 
of  encounters  resulting  in  NMACs  normalized  by  the  total  number  of  simulated  encounters.  There 
are  three  a  posteriori  corrections  that  must  be  used  when  estimating  the  NMAC  probability. 


1.  Encounters  are  more  likely  to  occur  among  aircraft  with  higher  relative  airspeeds,  but  air¬ 
speeds  are  sampled  from  the  distribution  of  observed  aircraft  airspeeds. 

2.  When  the  intruder  penetrates  the  side  face  of  the  cylinder,  the  initial  bearing  and  heading 
of  the  intruder  are  sampled  uniformly,  but  intruder  bearings  and  headings  which  result  in  a 
penetration  angle  nearer  to  zero  are  more  likely.  In  other  terms,  encounters  where  the  relative 
velocity  penetrates  the  encounter  cylinder  directly  are  more  likely  to  occur  than  when  the 
relative  velocity  skims  the  edge  of  the  encounter  cylinder.2 

3.  Encounters  are  not  generated  with  the  correct  proportion  of  intruders  penetrating  the  top  or 
bottom  of  the  cylinder  compared  to  the  side  of  the  cylinder.  The  true  proportion  is  derived 
from  the  volume  that  each  face  of  the  encounter  cylinder  sweeps  out  per  unit  time.  Hence, 
the  effort  of  estimating  P(nmac  |  enc)  may  be  decoupled  into  the  horizontal  and  vertical 
components  and  then  combined  into  an  overall  estimate. 


Since  encounters  are  not  sampled  from  the  actual  distributions,  the  encounters  must  be  weighted 
to  compensate  for  the  difference  between  the  actual  and  sampling  distributions.  This  process  is 
known  as  importance  sampling  [23].  The  general  process  for  estimating  P(ninac  |  enc)  using  the 
actual  distribution  p(x)  and  the  sampling  distribution  q(x)  is 


P(nmac  |  enc)  = 


(C-l) 


where  f(x)  is  the  probability  of  an  NMAC  given  the  generated  samples  (x^),  where  i  denotes  an 
encounter  sample.  It  should  be  noted  that  f(x)  is  not  constrained  to  be  P(nmac  |  enc),  but  rather 
any  metric — e.g.,  vertical  miss  distance,  course  deviation.  Because  the  summation  of  the  weights  is 
generally  unnormalized  (does  not  sum  to  unity),  we  must  normalize  by  the  term  ^2iP{x^)/q(x^). 

The  weighting  to  correct  the  improper  sampling  of  airspeeds  and  the  penetration  angle  is  described 
by 

p(x^) 
q(x^) 


,(0 

Rh 


cos  0b)  for  initialization  on  side  of  cylinder 


(t) 

%;v 


for  initialization  on  top/ bottom  of  cylinder 


(C-2) 


The  term  cos0  is  equivalent  to  — v^h  •  r.  The  average  relative  volume  that  the  encounter  cylinder 
sweeps  out  per  unit  time  is 

V  =  4re„c/iencWR,h  +  ™encVR<v  ,  (C-3) 

2Section  C..3  describes  the  method  for  accounting  for  this  correction  when  initializing  encounters. 
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where  cr  h  and  i>Ft,v  are  the  magnitudes  of  the  average  relative  horizontal  and  vertical  airspeeds 
between  any  two  aircraft  in  the  airspace,  respectively.  Note  that  UR,h  and  ^rjV  are  not  the  relative 
airspeeds  between  pairs  of  aircraft  that  result  in  an  encounter,  but  rather  between  any  two  aircraft 
in  the  airspace.  Therefore,  they  do  not  need  to  be  weighted  to  adjust  for  those  pairs  resulting  in 
encounters.  The  average  relative  airspeeds  may  be  estimated  directly  from  the  sample  mean  of  the 
initial  relative  airspeed — e.g.,  =  l/iVJ^Upur  The  encounter  rate  is 

Aenc  =PV,  (C-4) 

where  p  is  the  airspace  density  [31].  The  NMAC  rate  is  therefore  defined  by 

An  max:  =  P(nmac  |  enc)Aenc .  (C-5) 

It  should  be  noted  that  this  formation  of  the  encounter  rate  (Equation  C-4)  is  the  rate  at  which  a 
single  aircraft  encounters  a  population  of  intruder  aircraft  as  opposed  to  the  rate  of  all  encounters 
between  any  two  aircraft  in  a  specified  airspace. 

From  Equation  C-3,  the  total  volume  swept  out  by  the  encounter  cylinder  can  be  decoupled  into 
horizontal  and  vertical  components.  The  ratio  of  the  number  of  encounters  penetrating  the  side  of 
the  encounter  cylinder  (rih)  to  the  number  penetrating  the  top  or  bottom  (nv)  is  proportional  to 
the  ratio  of  the  horizontal  and  vertical  swept  volumes,  or 

_  4?’enc/ienc^R,h  /q 

nv  Vv  7rrenc^R,V 


Expanding  Equation  C-l  with  the  weighting  of  Equation  C-2  and  correcting  for  the  proportion  of 
encounters  penetrating  each  cylinder  surface  results  in  the  NMAC  probability  formulation 


P(nmac  |  enc)  = 


/  £^RhCOS 

\  4renc/ienC^R,h  ,  | 

V  £^R!hcos^(i)  j 

1  V  +l 

1  V 

(C-7) 


Indices  i  and  j  denote  encounter  samples  where  the  intruder  penetrates  the  horizontal  and  vertical 
surfaces  of  the  cylinder,  respectively.  The  terms  in  parenthesis  correct  for  the  improper  sampling  of 
the  relative  airspeed  and  the  penetration  angle,  while  the  term  outside  of  the  parenthesis  corrects 
for  the  improper  proportion  of  intruders  initialized  on  each  cylinder  surface. 


C.3  ALTERNATIVE  SAMPLING  METHODOLOGY 

The  method  described  above  for  initializing  AC2  on  the  side  of  the  encounter  cylinder  centered  on 
AC1  is  intuitive.  But,  this  method  results  in  an  abundance  of  short  encounters  where  the  closest 
point  of  approach  (defined  at  the  minimum  horizontal  separation)  is  near  the  radius  of  the  encounter 
cylinder.  These  short  encounters,  on  the  order  of  10  s,  do  not  affect  the  NMAC  probability  because 
an  avoidance  maneuver  is  very  unlikely.  In  addition,  the  standard  method  for  initializing  encounters 
also  does  not  allow  for  the  explicit  definition  of  the  CPA  distribution,  which  is  useful  when  the 
analyst  wants  to  focus  computation  on  close  encounter  situations  (i.e.,  NMACs).  The  procedure 
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Figure  C-2.  Alternative  horizontal  plane  sampling  methodology  illustration. 


as  described  below  is  illustrated  in  Figure  C-2  and  replaces  step  (2)  in  Section  C.l  for  horizontal 
encounters.  Step  (6),  which  ensures  that  the  intruder  penetrates  the  encounter  cylinder,  is  no 
longer  necessary. 

1.  Instead  of  initializing  intruder  aircraft  on  the  horizontal  face  of  the  encounter  cylinder  by  uni¬ 
formly  sampling  the  initial  relative  bearing  (y),  intruder  aircraft  are  initialized  by  uniformly 
sampling  the  horizontal  miss  distance  (HMD)  on  the  interval  [0,7*enc]. 

2.  The  intruder  aircraft  (AC2)  is  notionally  placed  a  distance  HMD  from  ACl  at  CPA.  AC2  may 
be  placed  in  front  or  behind  ACl.  Each  is  equally  probable,  so  one  should  sample  uniformly 
over  the  two  options. 

3.  The  CPA,  assuming  straight  line  trajectories,  is  the  position  where  the  sampled  HMD  is 
satisfied  together  with  the  condition  that  hmd  is  perpendicular  to  v^.  The  vector  hmd  is 
the  position  vector  from  ACl  to  the  CPA. 

4.  AC2  is  then  placed  on  the  encounter  cylinder  by  finding  the  intersection  of  with  the  hor¬ 
izontal  cylinder  boundary.  This  procedure  is  accomplished  such  that  the  intruder  penetrates 
the  encounter  cylinder  at  initialization. 

The  sampling  of  the  intruder  aircraft  heading  is  left  unchanged.  The  initial  relative  altitude,  defined 
at  penetration  of  the  encounter  cylinder,  is  sampled  uniformly  on  the  interval  [  henc,henc].  Note 
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that  the  notional  CF^  created  during  this  process  is  unlikely  to  be  realized  during  simulation 
because  vr  h  is  the  initial  relative  airspeed  and  is  therefore  likely  to  change  during  the  encounter — 
i.e.,  it  is  only  used  for  this  initialization  procedure.  When  using  this  encounter  initialization  method, 
the  term  cos  <p  is  no  longer  needed  in  Equation  C-7,  resulting  in 


P(nrnac  |  enc)  = 


l'gi''Rlh/(j('))''  4renc feenci'Rl, 
)  * 


y  v (*) 
%,h 


+ 


2  r, 


Trr, 

j  ? 


^R,v 


(C-8) 


Note  that  this  procedure  is  only  for  encounters  initialized  on  the  side  face  of  the  encounter  cylinder. 
Although  it  is  beyond  the  scope  of  this  document,  a  similar  procedure  may  be  used  to  initialize 
encounters  when  the  encounter  volume  is  a  sphere.  If  enough  aircraft  tracks  (encounter  model 
samples)  are  formed  prior  to  encounter  initialization,  a  high-confidence  estimate  of  the  ratio  of 
intruders  penetrating  each  face  could  be  formed  according  to  Equation  C-6.  This  estimate  could 
then  be  used  to  correctly  sample  the  encounter  cylinder  penetration  face,  removing  the  need  for 
each  cylinder  face  contribution  to  the  swept  volume  in  Equation  C-8. 


C.4  ALTITUDE  LAYER  AND  CLASS  WEIGHTING 


Encounters  are  generated  according  to  the  altitude  layer  and  airspace  class  distributions  defined  in 
the  model.  These  distributions  represent  the  observed  rate  of  occurrence  of  aircraft  in  each  altitude 
layer  and  airspace  class.  But  from  Equation  C-4,  the  encounter  rate  is  proportional  to  the  airspace 
density,  not  the  cumulative  occurrence  of  aircraft,  in  the  local  airspace.  For  example,  the  greatest 
observed  cumulative  occurrence  (total  flight  time)  of  aircraft  is  found  in  the  airspace  defined  as 
“other”  in  the  model  (corresponding  to  class  E  and  G  airspace),  but  this  airspace  also  occupies  the 
largest  volume.  Furthermore,  the  model  assumes  that  the  exposure  time  (te),  or  the  duration  of 
time  that  the  own  aircraft  operates  in  each  airspace,  is  equally  divided  amongst  each  airspace  class 
and  layer.  Therefore,  the  mean  NMAC  probability  over  all  airspace  classes  and  altitude  layers  that 
considers  these  factors  a  posteriori  is  defined  by 


P(nmac  |  enc)  =  ^  P(nmac  |  enc,  at,li)P(auli  |  enc)  = 


P(n mac  |  enc,  aiJl)plVlt 


r/.*(0 


ZiPiVitP 


(C-9) 


where  i  denotes  each  airspace  class  and  altitude  layer  combination.  The  term  P(ault  |  enc)  is  the 
proportion  of  encounters  expected  in  each  altitude  layer  and  airspace  class  combination. 

If  one  knows  pi,  Vu  and  re  ;  a  priori,  then  the  altitude  and  airspace  class  distributions  can  be  modified 
to  reflect  this  knowledge  before  sampling  from  the  model.  Then,  the  mean  NMAC  probability 
estimate  is  simply  the  NMAC  probability  for  all  encounters.  This  sampling  procedure  may  be 
useful  when  the  aircraft  of  interest  is  expected  to  operate  in  a  specific  airspace  class  or  altitude 
layer.  For  example,  some  aircraft  types  may  only  operate  in  the  first  altitude  band  (500-1200  ft 
AGL),  thus  simulating  encounters  in  the  other  layers  wastes  computation  because  they  will  not  be 
considered  in  the  final  estimate.  If  little  is  known  about  the  expected  operating  environment,  then 
an  objective  (uniform)  assumption  regarding  the  airspace  class  and  altitude  layer  may  be  suitable. 
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APPENDIX  D 

NETWORK  STRUCTURE  CANDIDATES 


This  section  describes  the  network  candidates  that  were  considered  for  the  littoral  models.  The 
optimal  network  was  chosen  by  maximizing  the  Bayesian  network  score  (Equation  E-7).  The 
optimal  scores  are  denoted  and  the  optimal  CONUS  models  are  shown  for  reference.  The  same 
set  of  networks  that  was  considered  for  the  CONUS  models  was  considered  for  the  littoral  models. 
The  candidate  model  structures  were  selected  based  on  expert  knowledge,  previous  encounter  model 
structures,  and  automated  search  techniques. 


D.l  UNCORRELATED  CANDIDATES 
D.1.1  Initial  Network  Candidates 


(1)  Edges:  13, 

Parameters:  71G7 

(2)  Edges:  10, 

Parameters: 

Model 

Score 

Model 

Score 

Littoral 

-38156685 

Littoral 

-38596563 

CONUS 

-967535636 

CONUS 

-975857519 
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A 


(3)  Edges:  12,  Parameters:  7083 


Model 

Score 

Littoral 

-38187483 

CONUS 

-968008451 

(5)  Edges:  10,  Parameters:  1775 

Model  Score 

Littoral  -38475107 
CONUS  -974521777 


Model 

Score 

Littoral 

-38090950 

CONUS 

-966680903 

(6)  Edges:  15,  Parameters:  31,359 


Model 

Score 

Littoral  (Best) 

-37927856 

CONUS  (Best) 

-961629997 
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(7)  Edges:  14,  Parameters:  9855 

Model  Score 

Littoral  -38091977 
CONUS  -966681414 


■0 


h 


v 


L 


A 


(8)  Edges:  0,  Parameters:  29 


Model 

Score 

Littoral 

-39924524 

CONUS 

-1013111341 
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D.1.2  Transition  Network  Candidates 


(1)  Edges:  11,  Parameters:  9296 


Model 

Score 

Littoral 

-1113146 

CONUS 

-29392892 

(2)  Edges:  14,  Parameters:  74,368 


Model 

Score 

Littoral 

-1173002 

CONUS 

-29213939 
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(3)  Edges:  12,  Parameters:  42,784 


Model 

Score 

Littoral 

-1136239 

CONUS 

-29251530 

(4)  Edges:  12,  Parameters:  18,592 


Model 

Score 

Littoral 

-1122499 

CONUS  (Best) 

-29173494 
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iko 


(5)  Edges:  10,  Parameters:  11,312 


Model  Score 


Littoral  -1110877 
CONUS  -29391204 


(6)  Edges:  21,  Parameters:  7,651,840 


Model 

Score 

Littoral 

-1370034 

CONUS 

-29790622 
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(7)  Edges:  10,  Parameters:  5936 


Model 

Score 

Littoral  (Best) 

-1106490 

CONUS 

-29464867 

■(;(<) 

h(t) 

4>(t) 

v(t  +  1) 

h{t  +  1) 

ip{t  + 1) 

(8)  Edges:  0.  Parameters:  16 


Model 

Score 

Littoral 

-15439646 

CONUS 

-407569877 
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D.2  CORRELATED  CANDIDATES 


D.2.1  Initial  Network  Candidates 


A 

L 

X 

P 

hmd 

vrrid 

V’i 

in 

Vl 

V\ 

Cx 

L 

ACl 

1p2 

h  2 

V2 

V2 

c2 

AC2 

(1)  Edges:  0,  Parameters:  83 

Model 

Score 

Littoral 

-682309 

CONUS 

-7225886 
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(2)  Edges:  27,  Parameters:  5203 


Model 

Score 

Littoral 

-646047 

CONUS 

-6746733 
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i 


AC\ 


AC  2 


(3)  Edges:  29,  Parameters:  (5877 


Model 

Score 

Littoral 

-645892 

CONUS 

-6743530 
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(4)  Edges:  25,  Parameters:  1865 


Model 

Score 

Littoral 

-648805 

CONUS 

-6783862 
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AC  l 


AC  2 


(5)  Edges:  41,  Parameters:  24,751 


Model 

Score 

Littoral 

-648264 

CONUS 

-6726995 
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(6)  Edges:  37,  Parameters:  17,358 


Model 

Score 

Littoral 

-643437  (Best) 

CONUS 

-6700437  (Best) 
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D.2.2  Transition  Network  Candidates 


(1)  Edges:  0,  Parameters:  32 


Model 

Score 

Littoral 

-8396812 

CONUS 

-90986115 

AC  1 

,4C2 

i>\  (<) 

hi(t ) 

^2  (f) 

V  V 



+  1) 

/ll  (£  +  1) 

^2  (t  +  1) 

ii2{t  + 1) 

(2)  Edges:  4,  Parameters:  288 


Model 

Score 

Littoral  (Best) 

-920981 

CONUS 

-9826752 
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ACl 

AC2 

MO 

Mo 

MO 

MO 

_ _ _ A _ 

h\  (t  +  1) 

Mi  (t  + 1) 

h2(t  +  1) H 

-fait  + 1) 

(3)  Edges:  6,  Parameters:  1440 


Model 

Score 

Littoral 

-925529 

CONUS 

-9828653 

(4)  Edges:  11,  Parameters:  10,080 


Model 

Score 

Littoral 

-945592 

CONUS 

-9834130 

AC  1 

Mo 

MO 

E _ 1 

/ll((  +  1}  H 

Mi(<  + 1) 

AC  2 

M0 

M0 

_ 3 

+  1)  •' 

M2  (t  +  1) 

(5)  Edges:  10,  Parameters:  7200 


Model 

Score 

Littoral 

-938287 

CONUS  (Best) 

-9822334 
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APPENDIX  E 
BAYESIAN  NETWORKS 


This  appendix  briefly  reviews  Bayesian  networks.  Further  discussion  of  Bayesian  networks  may  be 
found  elsewhere  [32  34]. 

E.l  DEFINITION 

A  Bayesian  network  is  a  graphical  representation  of  a  multivariate  probability  distribution  over 
variables  X  =  X\, . . . ,  Xn.  In  particular,  a  Bayesian  network  is  a  directed  acyclic  graph  G  whose 
nodes  correspond  to  variables  and  edges  correspond  to  probabilistic  dependencies  between  them. 
Associated  with  each  variable  Xx  is  a  conditional  probability  distribution  P(xl  |  7TZ),  where  7Tt 
denotes  an  instantiation  of  the  parents  of  X{  in  the  graph.  The  probability  of  an  instantiation 
of  the  variables  is  specified  directly  by  the  conditional  probability  distributions  in  the  Bayesian 
network: 

P(x)  =  P(xu...,xn)  =  f\P(xi\ni).  (E-l) 

Z  =  1 


E.2  SAMPLING 

It  is  rather  straightforward  to  sample  from  a  multivariate  distribution  represented  by  a  Bayesian 
network.  The  first  step  is  to  produce  a  topological  sort  of  the  nodes  in  the  network.  A  topological 
sort  orders  the  nodes  in  a  Bayesian  network  such  that  if  a  node  Xt  comes  before  Xj  there  does  not 
exist  a  directed  path  from  X3  to  Xx.  Every  Bayesian  network  has  at  least  one  topological  sort,  but 
there  may  be  many.  Efficient  algorithms  exist  for  finding  a  valid  topological  sort  [35]. 

To  produce  a  sample  from  the  joint  distribution  represented  by  a  Bayesian  network,  we  simply 
iterate  through  a  topologically  sorted  sequence  of  the  variables  and  sample  from  their  conditional 
probability  distributions.  The  topological  sort  ensures  that  when  sampling  from  each  conditional 
probability  distribution  the  necessary  parents  have  been  instantiated. 

E.3  PARAMETER  LEARNING 

The  parameters  0  of  a  Bayesian  network  determine  the  associated  conditional  probability  distri¬ 
butions.  Given  some  fixed  network  structure  G ,  we  can  learn  these  parameters  from  data.  In  this 
appendix,  we  assume  that  the  variables  are  discrete. 

Before  discussing  how  to  learn  the  parameters  of  a  Bayesian  network,  it  is  necessary  to  introduce 
some  notation.  Let  rx  represent  the  number  of  instantiations  of  Xx  and  qt  represent  the  number  of 
instantiations  of  the  parents  of  X\.  If  Xt  has  no  parents,  then  qt  —  1.  The  jth  instantiation  of  the 
parents  of  Xx  is  denoted  7Tf j . 
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There  are  E?=i  ri(li  parameters  in  a  Bayesian  network.  Each  parameter  is  written  9%jk  and  deter¬ 
mines  P(Xi  =  k  |  7Tjj),  i.e., 

P(Xi  ~  k  |  / Tij)  =  Oijk  • 

Although  there  are  EF=ir«9i  parameters,  only  EiLi(ri  “  l)?t  are  independent. 

Computing  the  posterior  p(0  |  D,G)  involves  specifying  a  prior  p(9  \  G )  and  applying  Bayes’  rule 


p(0\D)G)  = 


P(D\9,G)p(9\G) 
P(D  |  G) 


P(D  |  9,G)p(0  |  G) 
fP(D\9,G)p(6\G)d9 * 


(E-2) 


If  NtJk  is  the  count  of  =  k  given  7 rjj  in  the  data  ZZ,  then  the  probability  of  the  data  given  the 
parameters  9  is 

pw  1  e) = ft  ft  11  C‘  ■  <e-3) 

i=Ij=l  k=  1 


Let  9ij  —  (0jji, . . .  i9ijri).  Since  9{j  is  independent  of  O^y  when  ij  ^  i’f,  the  prior  probability  of 
the  parameters  assuming  a  fixed  structure  G  is 


p(o\c)  =  \l\lp(etJ\G). 


i=l 


(E-4) 


The  density  p(9ij  |  G)  is  a  distribution  over  relative  frequencies.  Under  some  very  weak  assumptions, 
it  is  possible  to  prove  that  p{9tJ  |  G)  is  Dirichlet  (see  [34],  Section  6.2.3).  Hence, 


p(0ij  I  G)  = 


r(atj0) 

mi!  rw 

n 


ijk 


if  0  <  9ijk  <  1  and  Oijk  =  1 
otherwise 


where  ayi, . . . ,  otijTl  are  the  parameters  of  the  Dirichlet  distribution  and  a^o  =  E^Li  aijk-  For 
the  prior  to  be  objective  (or  noninfonnative),  the  parameters  atjk  must  be  identical  for  all  fc. 
DifTerent  objective  priors  have  been  used  in  the  literature.  Cooper  and  Herskovits  [28]  use  =  1. 
Heckerman,  Geiger,  and  Chickering  [36]  use  and  justify  a{jk  =  l/(r2</2). 

It  is  possible  to  show  that  p(9l3  |  ZZ,  G)  is  Dirichlet  with  parameters  -I-  . . . ,  otijk  +  A Ty*. 

Hence, 


p(6l]\D,G)  = 


HU  ^}k+Nl}k)  1 
0 


if  0  <  0ijk  <  1  and  £[.'=1  0ljk  =  1 
otherwise 


where  ATy  =  Efc=i  Nijk- 

Sampling  from  a  Bayesian  network  with  G  known,  9  unknown,  and  D  observed  involves  assigning 
k  to  Xt  with  probability 


P(Xt  =  k  1 7 ry,  D.G)  =  (  0ijkp(0ijk  I  D,  G )  d 0ijk  =  °f  +  Er  .  (E-5) 

J  l^k'  =  l\ai]k'  +  Ntjk') 


64 


E.4  STRUCTURE  LEARNING 


Finding  the  most  likely  structure  G  that  generated  a  set  of  data  D.  The  objective  is  to  find  the 
most  likely  graph  given  data.  By  Bayes  rule, 

P(G  |  D)  tx  P(G)P(D  |  G)  =  P(G)  f  P(D  |  0,  G)p{6  \  G)  d 0 .  (E-6) 

The  previous  section  explains  how  to  compute  the  likelihood  P(D  \  6*G)  and  the  prior  p(0  \  G ). 
Cooper  and  Herskovits  [28]  show  how  to  evaluate  the  integral  above,  resulting  in 


p(G  i  d) = p{g)  n  n  p 


I  (^ijo) 


n 


+  Njjk) 


i=\j=\  I  (ab'0  +  Nij  )  £=1  l  ( aijk ) 


(E-7) 


where  NtJ  —  Ylk=\  ^ ijk •  Heckerman,  Geiger,  and  Chickering  [36]  suggest  priors  over  graphs,  but 
it  is  not  uncommon  in  the  literature  to  assume  a  uniform  prior.  For  numerical  convenience,  most 
Bayesian  network  learning  packages  calculate  and  report  log  P(G  \  D)  +  Ky  where  K  is  a  constant 
independent  of  G.  This  quantity  is  often  called  the  Bayesian  score  and  may  be  used  for  structure 
comparison  and  search. 
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